I am trying to use a data.table within a function, and I am trying to understand why my code is failing. I have a data.table as follows:
DT <- data.table(my_name=c("A","B","C","D","E","F"),my_id=c(2,2,3,3,4,4))
> DT
my_name my_id
1: A 2
2: B 2
3: C 3
4: D 3
5: E 4
6: F 4
I am trying to create all pairs of "my_name" with different values of "my_id", which for DT would be:
Var1 Var2
A C
A D
A E
A F
B C
B D
B E
B F
C E
C F
D E
D F
I have a function to return all pairs of "my_name" for a given pair of values of "my_id" which works as expected.
get_pairs <- function(id1,id2,tdt) {
return(expand.grid(tdt[my_id==id1,my_name],tdt[my_id==id2,my_name]))
}
> get_pairs(2,3,DT)
Var1 Var2
1 A C
2 B C
3 A D
4 B D
Now, I want to execute this function for all pairs of ids, which I try to do by finding all pairs of ids and then using mapply with the get_pairs function.
> combn(unique(DT$my_id),2)
[,1] [,2] [,3]
[1,] 2 2 3
[2,] 3 4 4
tid1 <- combn(unique(DT$my_id),2)[1,]
tid2 <- combn(unique(DT$my_id),2)[2,]
mapply(get_pairs, tid1, tid2, DT)
Error in expand.grid(tdt[my_id == id1, my_name], tdt[my_id == id2, my_name]) :
object 'my_id' not found
Again, if I try to do the same thing without an mapply, it works.
get_pairs3(tid1[1],tid2[1],DT)
Var1 Var2
1 A C
2 B C
3 A D
4 B D
Why does this function fail only when used within an mapply? I think this has something to do with the scope of data.table names, but I'm not sure.
Alternatively, is there a different/more efficient way to accomplish this task? I have a large data.table with a third id "sample" and I need to get all of these pairs for each sample (e.g. operating on DT[sample=="sample_id",] ). I am new to the data.table package, and I may not be using it in the most efficient way.
mapply
, it works if you putDT
directly into the function and not as parameter (although it doesn't solve the "why is it not working" part...) – Liberalizeid
always have exactly twonames
? – Poetry