How can I merge all dataframes nested in another dataframe and produce all combinations in one result data.frame?

Asked 15/8, 2023 at 7:9 Answered 16/8, 2023 at 4:6

Solved r dataframe merge data.table flatten

I have data like sampledata reproduced using the code below:

###producing sample data
sampledatain1 <- data.frame(in_1=c(1,2,3,4,5),in_2=c("A","B","C","D","E"),stringsAsFactors = F)
sampledatain2 <- data.frame(in_1=c(6,7,8,9,10),in_2=c("F","G","H","I","J"),stringsAsFactors = F)
sampledatain3 <- data.frame(in_1=c(99,98,97,96,95),in_2=c("Z","Y","X","W","V"),stringsAsFactors = F)
sampledata <- data.frame(row = 1:3,
                    colmerge1 = c("MA","MB","MC"), 
                    colmerge2 = -99:-97,
                    stringsAsFactors = FALSE)
sampledata$df <- list(sampledatain1,sampledatain2,sampledatain3)

I want to flatten all the nested dataframes and combine them so the result looks like desired_data:

###reproducing desired data 
library(data.table)
sampledatain1$row <- 1
sampledatain1$colmerge1 <- "MA"
sampledatain1$colmerge2 <- -99

sampledatain2$row <- 2
sampledatain2$colmerge1 <- "MB"
sampledatain2$colmerge2 <- -98

sampledatain3$row <- 3
sampledatain3$colmerge1 <- "MC"
sampledatain3$colmerge2 <- -97

desired_data <- rbindlist(list(sampledatain1,sampledatain2,sampledatain3))

How can I do this using a function to produce desired_data from sampledata?

Hypochondriac answered 15/8, 2023 at 7:9 Comment(0)

The thing you're looking for is unnest() (unnest_longer() then unpack()/unnest_wider() works too):

> tidyr::unnest(sampledata, df)
# A tibble: 15 × 5
     row colmerge1 colmerge2  in_1 in_2 
   <int> <chr>         <int> <dbl> <chr>
 1     1 MA              -99     1 A    
 2     1 MA              -99     2 B    
 3     1 MA              -99     3 C    
 4     1 MA              -99     4 D    
 5     1 MA              -99     5 E    
 6     2 MB              -98     6 F    
 7     2 MB              -98     7 G    
 8     2 MB              -98     8 H    
 9     2 MB              -98     9 I    
10     2 MB              -98    10 J    
11     3 MC              -97    99 Z    
12     3 MC              -97    98 Y    
13     3 MC              -97    97 X    
14     3 MC              -97    96 W    
15     3 MC              -97    95 V

To convert it to a data.table, you can use as.data.table()

For more information on unnest(), read the documentation.

Scone answered 15/8, 2023 at 7:13 Comment(1)

that df$ syntax when tibbles are displayed always trips me up 😑 it's so easy to think you have more columns than you actually do – Scone 15/8, 2023 at 7:51

If you want to work with data.table, you can directly use

> setDT(sampledata)[, as.data.table(df), by = row:colmerge2]
    row colmerge1 colmerge2 in_1 in_2
 1:   1        MA       -99    1    A
 2:   1        MA       -99    2    B
 3:   1        MA       -99    3    C
 4:   1        MA       -99    4    D
 5:   1        MA       -99    5    E
 6:   2        MB       -98    6    F
 7:   2        MB       -98    7    G
 8:   2        MB       -98    8    H
 9:   2        MB       -98    9    I
10:   2        MB       -98   10    J
11:   3        MC       -97   99    Z
12:   3        MC       -97   98    Y
13:   3        MC       -97   97    X
14:   3        MC       -97   96    W
15:   3        MC       -97   95    V

Diseuse answered 15/8, 2023 at 9:44 Comment(0)

Another data.table solution:

setDT(sampledata)[, rbindlist(df), by=row:colmerge2]

#       row colmerge1 colmerge2  in_1   in_2
#  1:     1        MA       -99     1      A
#  2:     1        MA       -99     2      B
#  3:     1        MA       -99     3      C
#  4:     1        MA       -99     4      D
#  5:     1        MA       -99     5      E
#  6:     2        MB       -98     6      F
#  7:     2        MB       -98     7      G
#  8:     2        MB       -98     8      H
#  9:     2        MB       -98     9      I
# 10:     2        MB       -98    10      J
# 11:     3        MC       -97    99      Z
# 12:     3        MC       -97    98      Y
# 13:     3        MC       -97    97      X
# 14:     3        MC       -97    96      W
# 15:     3        MC       -97    95      V

Lightyear answered 16/8, 2023 at 4:6 Comment(0)

Here is a way. have an auxiliary function take care of creating the new columns in the input data, call the function in a mapply loop and then rbind the resulting list.

library(data.table)

f_aux <- function(x, row, data) {
  x$row <- data[row, 1]
  x$colmerge1 <- data[row, 2]
  x$colmerge2 <- data[row, 3]
  x
}

sd_list <- list(sampledatain1,sampledatain2,sampledatain3)
i_row <- seq_len(nrow(sampledataframe))

mapply(f_aux, sd_list, i_row, MoreArgs = list(data = sampledataframe), SIMPLIFY = FALSE) |> rbindlist()
#>     in_1 in_2 row colmerge1 colmerge2
#>  1:    1    A   1        MA       -99
#>  2:    2    B   1        MA       -99
#>  3:    3    C   1        MA       -99
#>  4:    4    D   1        MA       -99
#>  5:    5    E   1        MA       -99
#>  6:    6    F   2        MB       -98
#>  7:    7    G   2        MB       -98
#>  8:    8    H   2        MB       -98
#>  9:    9    I   2        MB       -98
#> 10:   10    J   2        MB       -98
#> 11:   99    Z   3        MC       -97
#> 12:   98    Y   3        MC       -97
#> 13:   97    X   3        MC       -97
#> 14:   96    W   3        MC       -97
#> 15:   95    V   3        MC       -97

^{Created on 2023-08-15 with reprex v2.0.2}

Repro answered 15/8, 2023 at 7:29 Comment(0)

Recommended topics

Hot tags