How does one do a full join using data.table?
Asked Answered
R

3

32

In the data.table FAQ, the nomatch = NA parameter is said to be akin to an outer join. However, I haven't been able to get data.table to do a full outer join – only right outer joins.

For example:

a <- data.table("dog" = c(8:12), "cat" = c(15:19))

   dog cat
1:   8  15
2:   9  16
3:  10  17
4:  11  18
5:  12  19

b <- data.table("dog" = 1:10, "bullfrog" = 11:20)

    dog bullfrog
 1:   1       11
 2:   2       12
 3:   3       13
 4:   4       14
 5:   5       15
 6:   6       16
 7:   7       17
 8:   8       18
 9:   9       19
10:  10       20

setkey(a, dog)
setkey(b, dog)

a[b, nomatch = NA]

    dog cat bullfrog
 1:   1  NA       11
 2:   2  NA       12
 3:   3  NA       13
 4:   4  NA       14
 5:   5  NA       15
 6:   6  NA       16
 7:   7  NA       17
 8:   8  15       18
 9:   9  16       19
10:  10  17       20

So, nomatch = NA produces a right outer join (which is the default). What if I need a full join? For example:

merge(a, b, by = "dog", all = TRUE) 
# Or with plyr:
join(a, b, by = "dog", type = "full")

    dog cat bullfrog
 1:   1  NA       11
 2:   2  NA       12
 3:   3  NA       13
 4:   4  NA       14
 5:   5  NA       15
 6:   6  NA       16
 7:   7  NA       17
 8:   8  15       18
 9:   9  16       19
10:  10  17       20
11:  11  18       NA
12:  12  19       NA

Is that possible with data.table?

Ratal answered 2/3, 2013 at 4:34 Comment(2)
For joins with data.table see last answer of [this post][1] [1]: #14076565Cheapen
For all sorts of joins with data.table see last answer of [this post][1] [1]: #14076565Cheapen
A
38

You actually have it right there. Use merge.data.table which is exactly what you are doing when you call

merge(a, b, by = "dog", all = TRUE)

since a is a data.table, merge(a, b, ...) calls merge.data.table(a, b, ...)

Adenoid answered 2/3, 2013 at 5:6 Comment(1)
Is there any way we can achieve the same result with the A[B] syntax?Carrollcarronade
S
13
x= data.table(a=1:5,b=11:15)
y= data.table(a=c(1:4,6),c=c(101:104,106))

setkey(x,a)
setkey(y,a)

unique_keys <- unique(c(x[,a], y[,a]))
y[x[.(unique_keys), on="a"]  ] # Full Outer Join
Sold answered 24/10, 2017 at 7:31 Comment(1)
Brilliant! To make it easier to understand, I'd like to break the last two lines to three: 1) unique_keys <- data.table(a = unique(c(x[,a], y[,a]))) 2) xx <- x[unique_keys] 3) y[xx]Maurizia
P
0

A different approach to getting a full join would be to read the full join as a right join plus an anti join:

rbind(
  fill = TRUE,
  a[b, on = ...],
  a[!b, on = ...]
)
Piss answered 20/2, 2023 at 22:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.