Have nomatch return value as-is using match function in R
Asked Answered
V

3

11

I have a much larger existing dataframe. For this smaller example I would like to replace some of the variables (replace state (df1)) with newstate (df2) according to column "first." My issue is that values are returned as NA since only some of the names are matched in the new dataframe (df2).

Existing dataframe:

state = c("CA","WA","OR","AZ")
first = c("Jim","Mick","Paul","Ron")
df1 <- data.frame(first, state)

      first state
    1   Jim    CA
    2  Mick    WA
    3  Paul    OR
    4   Ron    AZ

New dataframe to match to existing dataframe

state = c("CA","WA")
newstate = c("TX", "LA")
first =c("Jim","Mick")
df2 <- data.frame(first, state, newstate)

  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA

Tried to use match but returns NA for "state" where a matching "first" variable from df2 is not found in the original dataframe.

df1$state <- df2$newstate[match(df1$first, df2$first)]

  first state
1   Jim    TX
2  Mick    LA
3  Paul  <NA>
4   Ron  <NA>

Is there a way to ignore nomatch or have nomatch return the existing variable as-is? This would be example of desired result: Jim/Mick's states are updated while Paul and Ron's state do not change.

      first state
    1   Jim    TX
    2  Mick    LA
    3  Paul    OR
    4   Ron    AZ
Viewpoint answered 4/10, 2014 at 3:16 Comment(0)
V
11

Is this what you want; BTW unless you really want to work with factors, use stringsAsFactors = FALSE in your data.frame call. Notice the use of nomatch = 0 in the match call.

> state = c("CA","WA","OR","AZ")
> first = c("Jim","Mick","Paul","Ron")
> df1 <- data.frame(first, state, stringsAsFactors = FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE)
> df1
  first state
1   Jim    CA
2  Mick    WA
3  Paul    OR
4   Ron    AZ
> df2
  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA
> 
> # create an index for the matches
> indx <- match(df1$first, df2$first, nomatch = 0)
> df1$state[indx != 0] <- df2$newstate[indx]
> df1
  first state
1   Jim    TX
2  Mick    LA
3  Paul    OR
4   Ron    AZ
Vasiliki answered 4/10, 2014 at 23:8 Comment(1)
Your code works. [Thank you.] But can you please explain why you have indx != 0 on the left side of the equals and indx on the right side of the equals? df1$state[indx != 0] <- df2$newstate[indx]Cerulean
G
3

I think you will get better behavior with character vectors than with factors.

> df1 <- data.frame(first, state,stringsAsFactors=FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors=FALSE)
> df1[ match(df2$first, df1$first ), "state"] <- df2$newstate
> df1
  first state
1   Jim    TX
2  Mick    LA
3  Paul    OR
4   Ron    AZ
Gravimetric answered 4/10, 2014 at 4:14 Comment(1)
I was able to reproduce your answer. I then converted my original data, all to characters and checked formats with str(). They appear to be identical in structure. When I try I try it on my larger, original dataset I get this: "Error in [<-.data.frame(*tmp*, match(df2$first, df1$first), : missing values are not allowed in subscripted assignments of data frames"Viewpoint
Q
2
library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)


setkey(DT1, first, state)
setkey(DT2, first, state)

DT1[DT2]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA

Note that [.data.table also has a nomatch argument, ie:

DT2[DT1, nomatch=0]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA

DT2[DT1, nomatch=NA]
#    first state newstate
# 1:   Jim    CA       TX
# 2:  Mick    WA       LA
# 3:  Paul    OR       NA
# 4:   Ron    AZ       NA

Quasijudicial answered 4/10, 2014 at 3:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.