r data.table ( <= 1.9.4) join behaviour -


i using r , data.table after time , still have issue join. asked this question resulting in satisfactory explanation still not logic. let's consider few examples:

library("data.table") x <- data.table(chiave=c("a", "a", "a", "b", "b"),valore1=1:5) y <- data.table(chiave=c("a", "b", "c", "d"),valore2=1:4) x    chiave valore1 1:            1 2:            2 3:            3 4:      b       4 5:      b       5  y    chiave valore2 1:            1 2:      b       2 3:      c       3 4:      d       4 

when join error:

 setkey(x,chiave)  x[y] # error in vecseq(f__, len__, if (allow.cartesian || notjoin) null else as.integer(max(nrow(x),  :    join results in 7 rows; more 5 = max(nrow(x),nrow(i)). check duplicate key values in i, each of join same group in x on , on again. if that's ok, try including `j` , dropping `by` (by-without-by) j runs each group avoid large allocation. if sure wish proceed, rerun allow.cartesian=true. otherwise, please search error message in faq, wiki, stack overflow , datatable-help advice. 

so:

 x[y,allow.cartesian=t]    chiave valore1 valore2 1:            1       1 2:            2       1 3:            3       1 4:      b       4       2 5:      b       5       2 6:      c      na       3 7:      d      na       4 

please note x has duplicate keys , i doesn't. if change y to:

 y <- data.table(chiave=c("b", "c", "d"),valore2=1:3)  y    chiave valore2 1:      b       1 2:      c       2 3:      d       3 

the join done no error message , no need allow.cartesian, logically situation same: x has multiple keys , i doesn't.

 x[y]    chiave valore1 valore2 1:      b       4       1 2:      b       5       1 3:      c      na       2 4:      d      na       3 

on other hand:

 x <- data.table(chiave=c("a", "a", "a", "a", "a", "a", "b", "b"),valore1=1:8)  y <- data.table(chiave=c("b", "b", "d"),valore2=1:3)  x    chiave valore1 1:            1 2:            2 3:            3 4:            4 5:            5 6:            6 7:      b       7 8:      b       8  y    chiave valore2 1:      b       1 2:      b       2 3:      d       3 

i have multiple keys in both x , i join (and cartesian product) done, no error message , no need allow.cartesian

 setkey(x,chiave)  x[y]    chiave valore1 valore2 1:      b       7       1 2:      b       8       1 3:      b       7       2 4:      b       8       2 5:      d      na       3 

from point of view, need warned if , if have multiple keys in both x , (not if resulting table has more rows max(nrow(x),nrow(i))) , in case see need of allow.cartesian (so not in first 2 examples).

just keep answered, behaviour allow.cartesian has been fixed in current development version v1.9.5, , available on cran v1.9.6. odd versions devel, , stable. news:

  1. allow.cartesian ignored during joins when:

    • i has no duplicates , mult="all". closes #742. @nigmastar report.
    • assigning reference, i.e., j has :=. closes #800. @matthieugomez report.

    in both these cases (and during not-join fixed in 1.9.4), allow.cartesian can safely ignored.


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -

Listeners to visualise results of load test in JMeter -