How to merge two dataframes using multiple columns as key?

D

2

5

Say I have the following dataframes:

DF1 <- data.frame("A" = rep(c("A","B"), 18),
                  "B" = rep(c("C","D","E"), 12),
                  "NUM"= rep(rnorm(36,10,1)),
                  "TEST" = rep(NA,36))

DF2 <- data.frame("A" = rep("A",6),
                  "B" = rep(c("C","D"),6),
                  "VAL" = rep(c(1,3),3))

*Note: Each unique combination of variables A and B in DF2 should have a unique VAL.

For each row, I would like to replace the NA in TEST with the corresponding value of VAL in DF1 if the values in columns A and A match and the values in columns B and B match for that row. Otherwise, I'd leave TEST as NA. How would I do this without looping through each combination using match?

Ideally, an answer would scale to two data frames with many columns to match upon.

Domingadomingo answered 23/3, 2015 at 14:22 Comment(6)

I think in your DF2 there are multiple rows that could match up to DF1 1st and 2nd columns. So, I am not sure which VAL should be taken. Also, based on the example, the dimensions of the two datasets are not the same. – Hailee 23/3, 2015 at 14:26

Sorry, each unique combination in DF2 should have a unique VAL. I need to figure out a way to encode that in the example. – Domingadomingo 23/3, 2015 at 14:29

In that case, there would be only two rows for DF2 i.e. DF2 <- data.frame(A=rep('A',2), B=c('C', 'D'), VAL=rnorm(2)) and merge will get the result merge(DF1, DF2, all=TRUE) – Hailee 23/3, 2015 at 14:30

@goldisfine: That doesn't clarify things much. We need to know which VAL should be chosen for a given combination of A and B since there are multiple. – Hydrocellulose 23/3, 2015 at 14:31

Now there is a unique value for each combination in DF2 so I think that there is only one possible option for VAL for each row in DF1. I guess DF2 could be reduced to only the unique combinations of the predictors. – Domingadomingo 23/3, 2015 at 14:34

In Python pandas, you could just simply set a compound key using variables A,B. Then this is a simple left-join/merge. – Scleroprotein 24/3, 2015 at 3:7

C

6

As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:

DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)

Note that this will produce a new dataframe with an empty TEST column (all values NA), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:

DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL

EDIT: in response to your question, you can boil down DF2 if necessary quite simple:

DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])

 A B VAL
1 A C   1
2 A D   3

Corkwood answered 23/3, 2015 at 14:53 Comment(3)

In my actual example, I have columns in DF2 which are numeric, and so if I passed the entire DF then I would have a unique DF that was too large. Can I subset DF2 to rows which are only unique in the factor variables? – Domingadomingo 23/3, 2015 at 14:56

Of course, the actual indices you take will vary based on what your extra columns are and how they are arranged – Corkwood 23/3, 2015 at 15:2

In Python pandas, you could just simply set a compound key using variables A,B. Then this is a simple left-join/merge. – Scleroprotein 24/3, 2015 at 3:6

I

11

# this is your DF1    
DF1 <- data.frame("A" = rep(c("A","B"), 18),
                      "B" = rep(c("C","D","E"), 12),
                      "NUM"= rep(rnorm(36,10,1)),
                      "TEST" = rep(NA,36))

#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
                  "B" = rep(c("C","D","E"),2),
                  "VAL" = rep(1:6))

# and this is the answer of what i assume you want      
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]

Indefatigable answered 23/3, 2015 at 14:52 Comment(0)

C

6