three data.table merge behavior inconsistency
Asked Answered
O

2

5

I've been searching around this morning to try to figure out if the failure below is expected but haven't found anything. Could anyone help point me to a related discussion? Otherwise, I might submit as an issue. Appreciate it.

library(data.table)

x <- data.table( a = 1:3 )
y <- data.table( a = 2:4 )
z <- data.table( a = 3:5 )

# works
merge( x , y )
# works
merge( y , z )

# fails
merge( x , merge( y , z ) )
# Error in merge.data.table(x, merge(y, z)) :
#   A non-empty vector of column names for `by` is required.

# works
merge( merge( x , y ) , z )
Ogrady answered 21/10, 2020 at 13:17 Comment(1)
Looks like this error comes from data.table as it works with data.frame.Too
C
6

This is a clear bug. Please report it. Luckily, it should be easy to fix.

merge.data.table contains this code:

if (is.null(by)) 
  by = intersect(key(x), key(y))
if (is.null(by)) 
  by = key(x)
if (is.null(by)) 
  by = intersect(names(x), names(y))

Now, the issue is that y is keyed (because merge.data.table sets a key):

x <- data.table( a = 1:3 )
y <- merge(data.table( a = 2:4 ), data.table( a = 3:5 ))
haskey(y)
#[1] TRUE

Then,

intersect(key(x), key(y))
#character(0)

Thus, none of the following if conditions is TRUE (we would want the third one to apply here).

This doesn't happen in your last case because of this:

intersect("foo", NULL)
#NULL
intersect(NULL, "foo")
#character(0)
Cirri answered 21/10, 2020 at 13:45 Comment(1)
thank you! github.com/Rdatatable/data.table/issues/4772Ogrady
C
1

This was fixed without fanfare in data.table 1.14.4 (May 2022) because the underlying behavior Roland points out of intersect() changed in base R -- intersect(NULL, "foo") is now NULL, which precipitated the fix Roland suggested without realizing the connection to this issue.

merge( x , merge( y , z ) )
# Key: <a>
#        a
#    <int>
# 1:     3
Constituency answered 7/9 at 6:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.