Merge multiple data frames - Error in match.names(clabs, names(xi)) : names do not match previous names
Asked Answered
C

1

6

I'm getting some really bizarre stuff while trying to merge multiple data frames. Help!

I need to merge a bunch of data frames by the columns 'RID' and 'VISCODE'. Here is an example of what it looks like:

d1 = data.frame(ID = sample(9, 1:100), RID = c(2, 5, 7, 9, 12),
            VISCODE = rep('bl', 5),
            value1 = rep(16, 5))

d2 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 7, 7, 7),
            VISCODE = rep(c('bl', 'm06', 'm12'), 3),
            value2 = rep(100, 9))

d3 = data.frame(ID = sample(9, 1:100), RID = c(2, 2, 2, 5, 5, 5, 9,9,9),
            VISCODE = rep(c('bl', 'm06', 'm12'), 3),
            value3 = rep("a", 9),
            values3.5 = rep("c", 9))

d4 = data.frame(ID =sample(8, 1:100), RID = c(2, 2, 5, 5, 5, 7, 7, 7, 9),
            VISCODE = c(c('bl', 'm12'), rep(c('bl', 'm06', 'm12'), 2), 'bl'),
            value4 = rep("b", 9))

dataList = list(d1, d2, d3, d4)

I looked at the answers to the question titled "Merge several data.frames into one data.frame with a loop." I used the reduce method suggested there as well as a loop I wrote:

try1 = mymerge(dataList)

try2 <- Reduce(function(x, y) merge(x, y, all= TRUE,
by=c("RID", "VISCODE")), dataList, accumulate=F)

where dataList is a list of data frames and mymerge is:

mymerge = function(dataList){

L = length(dataList)

mdat = dataList[[1]]

  for(i in 2:L){

    mdat = merge(mdat, dataList[[i]], by.x = c("RID", "VISCODE"),
                                  by.y = c("RID", "VISCODE"), all = TRUE)
  }

mdat
}

For my test data and subsets of my real data, both of these work fine and produce exactly the same results. However, when I use larger subsets of my data, they both break down and give me the following error: Error in match.names(clabs, names(xi)) : names do not match previous names.

The really weird thing is that using this works:

  dataList = list(demog[1:50,],
            neurobat[1:50,],
            apoe[1:50,],
            mmse[1:50,],
            faq[1:47, ])

And using this fails:

  dataList = list(demog[1:50,],
            neurobat[1:50,],
            apoe[1:50,],
            mmse[1:50,],
            faq[1:48, ])

As far as I can tell, there is nothing special about row 48 of faq. Likewise, using this works:

dataList = list(demog[1:50,],
            neurobat[1:50,],
            apoe[1:50,],
            mmse[1:50,],
            pdx[1:47, ])

And using this fails:

dataList = list(demog[1:50,],
            neurobat[1:50,],
            apoe[1:50,],
            mmse[1:50,],
            pdx[1:48, ])

Row 48 in faq and row 48 in pdx have the same values for RID and VISCODE, the same value for EXAMDATE (something I'm not matching on) and different values for ID (another thing I'm not matching on). Besides the matching RID and VISCODE, I see anything special about them. They don't share any other variable names. This same scenario occurs elsewhere in the data without problems.

To add icing on the complication cake, this doesn't even work:

dataList = list(demog[1:50,],
            neurobat[1:50,],
            apoe[1:50,],
            mmse[1:50,],
            faq[1:48, 2:3])

where columns 2 and 3 are "RID" and "VISCODE".

48 isn't even the magic number because this works:

 dataList = list(demog[1:500,],
            neurobat[1:500,],
            apoe[1:500,],
            mmse[1:457,])

while using mmse[1:458, ] fails.

I can't seem to come up with test data that causes the problem. Has anyone had this problem before? Any better ideas on how to merge?

Culp answered 24/3, 2010 at 0:10 Comment(7)
Can you actually show rows 47 and 48 of the problem data frames? I'm guessing this has something to do with different data types...Colburn
faq[48,] ID RID VISCODE EXAMDATE FAQSOURCE FAQFINAN FAQFORM 48 1230 16 m12 04/12/2006 2 0 0 FAQSHOP FAQGAME FAQBEVG FAQMEAL FAQEVENT FAQTV FAQREM 48 0 0 0 0 0 0 0 FAQTRAVL FAQTOTAL 48 0 0Culp
oh wow, that looks terrible. Let me try againCulp
faq[48,] ID RID VISCODE EXAMDATE FAQSOURCE FAQFINAN FAQFORM 48 1230 16 m12 04/12/2006 2 0 0 FAQSHOP FAQGAME FAQBEVG FAQMEAL FAQEVENT FAQTV FAQREM 48 0 0 0 0 0 0 0 FAQTRAVL FAQTOTAL 48 0 0Culp
still bad. Not sure how best to do it. The values in faq[48,] are primarily 0's. The values in pdx[48,] are primarily -4's. They both have a date in date format and a character string for VISCODE.Culp
You could check this thread: stat.ethz.ch/pipermail/r-help/2009-February/188975.html.Symons
-1: the information provided is not sufficient to reproduce the problem.Messidor
B
1

Not sure I can help unfortunately but thought I would post as I found this searching for help on this error. What I effectively had was:

a <- cbind(b,c)
d <- merge(a,e)

And I got that same error. Using a <- data.frame(b,c) fixed the problem, but I can't work out why.

object.size(a);1248124200 bytes

object.size(c);1248124032 bytes

So something is different. All classes are the same, str() reveals nothing. I'm stumped.

Hopefully that aids someone else in the know.

Bushbuck answered 21/2, 2011 at 3:53 Comment(1)
Try all.equal function. It prints some detailed information about differences.Symons

© 2022 - 2024 — McMap. All rights reserved.