Using dplyr's rename() including variable names not in data set
Asked Answered
K

4

7

I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10])

sample2 <- data.frame(B=11:20, C=letters[11:20])

And then,

 rename(sample1, var1 = A, var2 = B, var3 = C)

I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get

Error: Unknown variables: C.

In contrast, the plyr syntax would let me use

rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))
rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3"))

and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

Kenwee answered 25/2, 2015 at 1:14 Comment(1)
You could reference the rename function specifically from plyr: plyr::rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))Ursulaursulette
F
5

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table:

sample1 <- data.frame(A=1:10, B=letters[1:10])
sample2 <- data.frame(B=11:20, C=letters[11:20])

rename_map <- c("A"="var1",
                "B"="var2",
                "C"="var3")

names(sample1) <- rename_map[names(sample1)]
str(sample1)

names(sample2) <- rename_map[names(sample2)]
str(sample2)

Fundamentally the algorithm is simple:

  1. Build a lookup table of current variable names to desired names
  2. Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.

EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

Fitzwater answered 25/2, 2015 at 2:4 Comment(2)
You could make this rather simpler by using a named character vector rather than a named listDetrusion
If you only want to name a subset of columns, this will set all other existing column names to NA.Karynkaryo
G
1
    #no need to use rename 

    oldnames<-unique(c(names(sample1),names(sample2)))
    newnames<-c("var1","var2","var3")
    name_df<-data.frame(oldnames,newnames)
    mydata<-list(sample1,sample2) # combined two datasets as a list
#one liner
    finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in%  colnames(i),2]
return(i)})
> finaldata
[[1]]
   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

[[2]]
   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
Grizzly answered 25/2, 2015 at 2:3 Comment(0)
P
1

I’ve used @earino’s answer before myself, but discovered that it can be unsafe. If column names of the data frame are missing in the (names of the) named vector, those column names are silently replaced with NA and that is certainly not what you want.

d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

rename_vec <- c("B" = "var2", "C" = "var3")

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

The same can happen, if you run names(d1) <- rename_vec[names(d1)] twice by accident, because when you run it the second time, none of the colnames(d1) are in names(rename_vec).

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA: int  1 2 3 4 5 6 7 8 9 10
#>  $ NA: chr  "a" "b" "c" "d" ...

We just need to select those columns that are in the data frame and in the rename vector.

d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

sel <- is.element(colnames(d2), names(rename_vec))
names(d2)[sel] <- rename_vec[names(d2)][sel]
str(d2)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ B1  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

UPDATE: I initially had a solution here that involved string replacement, which turned out to be unsafe as well, because it allowed for partial matching. This one is better, I think.

Philine answered 7/12, 2018 at 11:30 Comment(2)
You could also stick with original solution but for safety add something like the following above: stopifnot(all(names(sample1) %in% names(rename_vec )))Enidenigma
@snoram I found a better solution than my original one, I think, which essentially is just a small adjustment to earino's.Philine
G
1

With dplyr, we can use a named vector with old names as values and new names as names, then unquote only the values in name_vec that matches names in your dataset. rename supports unquoting characters, so there is no need to convert them to sym beforehand:

library(dplyr)

name_vec <- c(var1 = "A", var2 = "B", var3 = "C")

sample1 %>%
  rename(!!name_vec[name_vec %in% names(.)])

sample2 %>%
  rename(!!name_vec[name_vec %in% names(.)])

Also, with setNames:

name_vec <- c(A = "var1", B = "var2", C = "var3")

sample1 %>%
  setNames(name_vec[names(.)])

sample2 %>%
  setNames(name_vec[names(.)])

Output:

   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
Gridiron answered 7/12, 2018 at 16:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.