I am trying to merge (join) multiple data tables (obtained with fread from 5 csv files) to form a single data table. I get an error when I try to merge 5 data tables, but works fine when I merge only 4. MWE below:
# example data
DT1 <- data.table(x = letters[1:6], y = 10:15)
DT2 <- data.table(x = letters[1:6], y = 11:16)
DT3 <- data.table(x = letters[1:6], y = 12:17)
DT4 <- data.table(x = letters[1:6], y = 13:18)
DT5 <- data.table(x = letters[1:6], y = 14:19)
# this gives an error
Reduce(function(...) merge(..., all = TRUE, by = "x"), list(DT1, DT2, DT3, DT4, DT5))
Error in merge.data.table(..., all = TRUE, by = "x") : x has some duplicated column name(s): y.x,y.y. Please remove or rename the duplicate(s) and try again.
# whereas this works fine
Reduce(function(...) merge(..., all = TRUE, by = "x"), list(DT1, DT2, DT3, DT4))
x y.x y.y y.x y.y
1: a 10 11 12 13
2: b 11 12 13 14
3: c 12 13 14 15
4: d 13 14 15 16
5: e 14 15 16 17
6: f 15 16 17 18
I have a workaround, where, if I change the 2nd column name for DT1:
setnames(DT1, "y", "new_y")
# this works now
Reduce(function(...) merge(..., all = TRUE, by = "x"), list(DT1, DT2, DT3, DT4, DT5))
Why does this happen, and is there any way to merge an arbitrary number of data tables with the same column names without changing any of the column names?
y.y
s, etc. – BemaReduce
function can't give a new column name. In other words, you run out of combinations ofx
andy
(there are only 4 possible combinations so the error occurs when working with a 5th column). – Beauxsetnames
before/during the join, not after – Mayersx
vector contain repeated values? Is it identical in allDT
s? – Bemax
is exactly the same in all DTs. There are no repeated values. – Basidiumx
is exactly the same for all of them, you shouldn't be merging – Mayerscbind
whenx
is exactly the same. I started using merge because I thought it would take care of any missing rows in some of the DTs, so yes, to clarify,x
might not be the same in all DTs, but definitely has no repeated values. Sorry for the confusion! – Basidiumx
, sorry for mis-stating the case – Basidiumy.1 y.2 y.3 y.4 y.5
? And given you know that only columny
is duplicated. So we don't need to auto-detect duplicate columns, and auto-rename/number them. – Creatordata.table::merge()
has an argumentsuffixes = c(".x", ".y")
, so you can avoid the rename step, just pass the corresponding number suffix already. – Creator