Prefix all columns resulting from left_join() with original table names
Asked Answered
B

3

9

I would like to add a prefix to all columns that result from a left join.

left_join() has the possibility to add a suffix when names are identical between the two tables being joined. But, it does not have an option to always add this suffix even when they are not identically named. And it does not have an option to instead add a prefix.

library(dplyr)
library(nycflights13)
flights2 <- flights %>% select(year:day, hour, origin, dest, tailnum, carrier)
airports2 <- airports

result <- flights2 %>% left_join(airports2, c("dest" = "faa")) %>% head()

The result:

Source: local data frame [6 x 14]

year month   day  hour origin  dest tailnum carrier                            name
(int) (int) (int) (dbl)  (chr) (chr)   (chr)   (chr)                           (chr)
1  2013     1     1     5    EWR   IAH  N14228      UA    George Bush Intercontinental
2  2013     1     1     5    LGA   IAH  N24211      UA    George Bush Intercontinental
3  2013     1     1     5    JFK   MIA  N619AA      AA                      Miami Intl
4  2013     1     1     5    JFK   BQN  N804JB      B6                              NA
5  2013     1     1     5    LGA   ATL  N668DN      DL Hartsfield Jackson Atlanta Intl
6  2013     1     1     5    EWR   ORD  N39463      UA              Chicago Ohare Intl
Variables not shown: lat (dbl), lon (dbl), alt (int), tz (dbl), dst (chr)

Here, it is not possible to know, only from the join result, from which original table each column came.

The purpose of adding this prefix is so that column names be reliably calculated from table names and the column names of data loaded in from a relational database. The database structure loaded into and stored in R and naming conventions for the relational database will be used, for example, to identify primary and foreign keys. These will then be used to setup the joins and to later retrieve data from the join results.

I've found a similar question for mySQL, but not for R:

In a join, how to prefix all column names with the table it came from

Batey answered 24/10, 2016 at 20:1 Comment(0)
B
8

A straightforward way to achieve this would be to add the prefixes to the original tables before performing the join:

# add prefix before joining:
names(flights2) <- paste0("flights2.", names(flights2) )
names(airports2) <- paste0("airports2.", names(airports2) )

# in join, use names with prefixes
result <- flights2 %>% left_join(airports2, c("flights2.dest" = "airports2.faa") ) %>% head()

The result:

Source: local data frame [6 x 14]

flights2.year flights2.month flights2.day flights2.hour flights2.origin flights2.dest
(int)          (int)        (int)         (dbl)           (chr)         (chr)
1          2013              1            1             5             EWR           IAH
2          2013              1            1             5             LGA           IAH
3          2013              1            1             5             JFK           MIA
4          2013              1            1             5             JFK           BQN
5          2013              1            1             5             LGA           ATL
6          2013              1            1             5             EWR           ORD
Variables not shown: flights2.tailnum (chr), flights2.carrier (chr), airports2.name (chr),
airports2.lat (dbl), airports2.lon (dbl), airports2.alt (int), airports2.tz (dbl),
airports2.dst (chr)

Now, the joined data frame can be easily referred to in this manner: tableName.columnName

Batey answered 24/10, 2016 at 20:1 Comment(6)
flights2::year is not syntactically valid name. I would use flights2_ or flights2.Carnes
The relational database (FileMaker) uses this naming convention and so I wanted to preserve the names even if they are not ideal for R. I also have column names that start with underscores. Can you give an example of where it would cause a problem or require reformatting? I am using [[ for indexing.Batey
As we are using dplyr, try this example: mtcars$col1 <- seq(nrow(mtcars)); colnames(mtcars)[12] <- "xx:yy"; mtcars %>% filter(xx::yy > 5)Carnes
I could switch the :: to a . without a problem but it would be nice to understand more about why I would do this.Batey
OK, I'm convinced! I'm switching to . now. Thanks!Batey
You could also simplify the renaming portion using %>% : #33741857Ethnarch
D
4

Similar answer but for suffix and in one flow:

[edit: changes for "funs() is soft deprecated as of dplyr 0.8.0" /edit]

library(dplyr)
(band_members
%>% rename_all( list(~paste0(., ".left")))  # < dpylr 0.8.0: %>% rename_all( funs(paste0(., ".left")))
%>% left_join(band_instruments, by = c("name.left"="name"))
%>% rename_at( .vars = vars(-ends_with(".left")),list(~paste0(., ".right")))# < dpylr 0.8.0: %>% rename_at( .vars = vars(-ends_with(".left")),funs(paste0(., ".right"))
)

or

(band_members
  %>% rename_all( list(~paste0(., ".left")))  # < dpylr 0.8.0: %>% rename_all( funs(paste0(., ".left"))) 
  %>% left_join(
  band_instruments %>% rename_all( list(~paste0(., ".right"))), # < dpylr 0.8.0: band_instruments%>% rename_all( funs(paste0(., ".right")))
  by = c("name.left"="name.right")
  )
)  

Both give:

    #  A tibble: 3 x 3
  name.left band.left plays.right
  <chr>     <chr>     <chr>      
1 Mick      Stones    <NA>       
2 John      Beatles   guitar     
3 Paul      Beatles   bass 

edit: with datas from dplyr doc

library(tidyverse)

band_members <- tribble(
  ~ name,    ~ band,
  "Mick",  "Stones",
  "John", "Beatles",
  "Paul", "Beatles"
)

band_instruments <- tribble(
  ~ name,   ~ plays,
  "John",  "guitar",
  "Paul",    "bass",
  "Keith", "guitar"
)
Dorothy answered 19/1, 2019 at 0:19 Comment(0)
B
1

Updated based on dplyr 1.0.10

If you only have a few columns from the join tables you can manage this directly via the select() function.

# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>% 
  left_join(
    (dfArcBuilds %>% select(inFidZip5 = user_zip5, inFidUuId = user_uuid, oid_)),
    by = c("in_fid" = "oid_")
  ) %>% 
  left_join(
    (dfArcBuilds %>% select(nearFidZip5 = user_zip5, nearFidUuId = user_uuid, oid_)),
    by = c("near_fid" = "oid_")
  ) %>% 
  {.}

If you want to prefix or suffix based on a pattern you can use rename_with() and gsub().

# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>% 
  left_join(
    (dfArcBuilds %>% rename_with(.fn = ~ gsub("user_", "inFid_", .x, fixed = TRUE), .col = everything())),
    by = c("in_fid" = "oid_")
  ) %>% 
  {.}

If you want to prefix or suffix everything you can use rename_with() and paste0().

# oid_, in_fid, near_fid, near_dist, near_rank
dfArcNearBy %>% 
  left_join(
    (dfArcBuilds %>% rename_with(.fn = ~ paste0("{PREFIX}",.x,"{SUFFIX}"))),
    by = c("in_fid" = "inFid_oid_")
  ) %>%
  {.}
Baran answered 13/9, 2022 at 22:14 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.