R - merge() returns NA´s in ALL columns although all.x=T

Asked 18/10, 2013 at 14:21 Answered 9/9, 2020 at 14:6

I am new here and have searched the forum for my problem, but did not find a solution. I have two data frames which I want to merge on a common key field.

          merge(x
               ,y
               ,by.x="a"
               ,by.y="b"
               ,all.x=T
               ,sort=F
               )

Since my x-Dataframe has more rows than my y-Dataframe, I want to keep all rows from x with NA for the column from y but with all values in the columns from x. This code is only giving me extra rows for the unmatched cases with NA in ALL columns (columns from x and y). I would be really grateful if someone could help me out? Where is my mistake?

Example:

a = data.frame(c(111,222,333,444),c(1,5,3,8))
b = data.frame(c(111,222),c(0.1,0.4))
colnames(a)=c("code","value")
colnames(b)=c("code","value")
c = merge(a
          ,b
          ,by="code"
          ,all.x=T)

In this example it is working properly. In my data I obtain NA in all columns in row 3&4.

I hope you can understand my lousy example?!

Thank you! Jessica ;)

Eligibility answered 18/10, 2013 at 14:21 Comment(3)

Welcome to SO. Please provide a reproducible example, so we can find your problem more easily. – Plast 18/10, 2013 at 14:27

This can happen if you have a missing value (NA) in the column you are matching on (code in your example dataset) in the x dataset. Is that the case? – Washout 18/10, 2013 at 15:59

No, I just checked that. No NA in either matching column... – Eligibility 18/10, 2013 at 16:20

Just set all=TRUE.

# Create your data
x<-data.frame(val1=c(2,8,6,3),a=c('h','k','b','e'))
y<-data.frame(val2=c(4,1),b=c('h','e'))
# Outer join
merge(x,y,by.x='a',by.y='b',all=TRUE)
#   a val1 val2
# 1 b    6   NA
# 2 e    3    1
# 3 h    2    4
# 4 k    8   NA

Guerdon answered 18/10, 2013 at 15:43 Comment(3)

Unfourtunately, with my data I would get NA for the whole rows 1 and 4 in your example?! – Eligibility 18/10, 2013 at 15:53

@nograpes, can you please explain why this works? It worked for me, but I'd like to know the reason behind it. Thanks. – Kyanite 9/9, 2015 at 16:12

If you are proposing a left outer join or right outer join all=TRUE is no option. – Maggiemaggio 31/3, 2021 at 15:23

So curiously if you re-run the exact same code from @nograpes but you have the same column name in x and y, then you get NAs like you had in your example. See below

x <- data.frame(val1=c(2,8,6,3), a=c('h','k','b','e'))
y <- data.frame(val2=c(4,1),     a=c('h','e'))
merge(x, y, by = 'a', all=TRUE)

Result from the console is

a  val1 val2
b    6   NA
e    3    1
h    2    4
k    8   NA

Batista answered 9/9, 2020 at 14:6 Comment(0)

Recommended topics

Hot tags