I am trying to efficiently merge with a condition.
The way I am doing it now is to cross-join (which I want to preserve) except I have one condition for a subset of the columns.
Cross join function (from another question post):
CJ.table.1 <- function(X,Y)
setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
set.seed(1)
#generate data
x = data.table(t=rep(1:10,2), z=sample(1:10,20,replace=T))
x2 = data.table(tprime=rep(1:10,2), zprime=sample(1:10,20,replace=T))
joined = CJ.table.1(x,x2)
> joined
t z tprime zprime
1: 1 3 1 10
2: 2 4 1 10
3: 3 6 1 10
4: 4 10 1 10
5: 5 3 1 10
---
396: 6 5 10 5
397: 7 8 10 5
398: 8 10 10 5
399: 9 4 10 5
400: 10 8 10 5
Then I want to make sure t
is increasing by 1 only.
setcolorder(joined, c("t", "tprime", "z",'zprime'))
joined=joined[tprime==t+1]
The final desired output is then:
> joined
t tprime z zprime
1: 1 2 3 3
2: 1 2 3 3
3: 2 3 4 7
4: 2 3 2 7
5: 3 4 6 2
6: 3 4 7 2
7: 4 5 10 3
8: 4 5 4 3
9: 5 6 3 4
10: 5 6 8 4
11: 6 7 9 1
12: 6 7 5 1
13: 7 8 10 4
14: 7 8 8 4
15: 8 9 7 9
16: 8 9 10 9
17: 9 10 7 4
18: 9 10 4 4
19: 1 2 3 6
20: 1 2 3 6
21: 2 3 4 5
22: 2 3 2 5
23: 3 4 6 2
24: 3 4 7 2
25: 4 5 10 9
26: 4 5 4 9
27: 5 6 3 7
28: 5 6 8 7
29: 6 7 9 8
30: 6 7 5 8
31: 7 8 10 2
32: 7 8 8 2
33: 8 9 7 8
34: 8 9 10 8
35: 9 10 7 5
36: 9 10 4 5
t tprime z zprime
The reason I want to condition BEFORE the cross join is that the actual data I have is huge and therefore, it is inefficient to generate the entire thing first and THEN prune it down.
The reason I can't just do a merge is that I need to cross join the other rows as well.
x[, k:=t+1][x2[, k:=tprime], on=.(k), nomatch=0L][, .(t, tprime, z, zprime)]
– Brambleallowcartesian=TRUE
whenever u get that error message. to add more condition, use something along the line ofx[, c('k', 'y1') := .(t+1, z)][x2[, c('k','y2') := .(tprime, myfun(zprime)], on=.(k=k, y1<y2)][, .(t, tprime, z, zprime)]
– Brambleon=c("y1<y2", "y3<y4")
. you can see the help in?data.table
– Bramble