Cross copying in R
Asked Answered
S

3

5

I ask for some help to solve the following issue. I have a dataset composed as

from   to
A      B
A      B
C      D
C      D

I want to get the following dataset

from   to
A      B
A      C
C      A
C      D

Basically, after group_by(from), I want a "cross-copying" between the value in the "to" column, between the last value of the "from" column in the first group and the first value of the "from" column in the second group, and so on for each group. I am using "complete" but it is not helpful.

Salesin answered 20/2, 2023 at 17:28 Comment(1)
Sorry, but your question is not clear. Please clarify how the to column should be created. Maybe a longer example could be more helpful.Piecework
H
2

Here is a tidyverse solution:

library(tidyverse) 

dat <- tibble(
  from = c("A", "A", "C", "C"), 
  to = c("B", "B", "D", "D")
)
sol <- dat %>% 
  mutate(
    fst = lag(from), # lag `from` for first values
    lst = lead(from) # lead `from` for last values
  ) %>% 
  group_by(from) %>% 
  transmute(
    to = case_when( 
      row_number() == 1 & !is.na(fst) ~ fst, # if first element in group and lagged `from` is not NA then equals lagged `from`
      row_number() == n() & !is.na(lst) ~ lst, # if last element in group and leaded `from` is not NA, then equals leaded `from`
      T ~ to # else `to`
    )
  ) %>% 
  ungroup()

sol
#> # A tibble: 4 × 2
#>   from  to   
#>   <chr> <chr>
#> 1 A     B    
#> 2 A     C    
#> 3 C     A    
#> 4 C     D
Haley answered 20/2, 2023 at 19:49 Comment(3)
Nice, but badly loses the "code golf" competition here :-)Ng
I agree, it loses on compactness. It's probably also less efficient on scale. I upvoted the other solutions. But this is not code golf, it's SO. If we want people to not just copy and paste code they don't understand, we need to provide answers they can understand. And these days dplyr is what most people understand.Haley
no argument -- and I'm not going near the flame wars about "tidyverse vs. the rest of R"Ng
A
5

Get the indices of the values you want to change (no need to group), and replace them by the reversed values:

library(dplyr)
idx <- which(lag(df$from) != df$from | lead(df$from) != df$from)
df[idx, "to"] <- df$from[c(matrix(idx, nrow = 2)[2:1, ])]

output

  from to
1    A  B
2    A  C
3    C  A
4    C  D
Alexia answered 20/2, 2023 at 17:41 Comment(0)
T
4

Using base R

lst1 <- split(df1$to, df1$from)
df1$to <- unlist(Map(\(x, nm, i) {x[i] <- nm; x},
     lst1, rev(names(lst1)), length(lst1):1))

-output

df1
 from to
1    A  B
2    A  C
3    C  A
4    C  D
Tsuda answered 20/2, 2023 at 17:54 Comment(0)
H
2

Here is a tidyverse solution:

library(tidyverse) 

dat <- tibble(
  from = c("A", "A", "C", "C"), 
  to = c("B", "B", "D", "D")
)
sol <- dat %>% 
  mutate(
    fst = lag(from), # lag `from` for first values
    lst = lead(from) # lead `from` for last values
  ) %>% 
  group_by(from) %>% 
  transmute(
    to = case_when( 
      row_number() == 1 & !is.na(fst) ~ fst, # if first element in group and lagged `from` is not NA then equals lagged `from`
      row_number() == n() & !is.na(lst) ~ lst, # if last element in group and leaded `from` is not NA, then equals leaded `from`
      T ~ to # else `to`
    )
  ) %>% 
  ungroup()

sol
#> # A tibble: 4 × 2
#>   from  to   
#>   <chr> <chr>
#> 1 A     B    
#> 2 A     C    
#> 3 C     A    
#> 4 C     D
Haley answered 20/2, 2023 at 19:49 Comment(3)
Nice, but badly loses the "code golf" competition here :-)Ng
I agree, it loses on compactness. It's probably also less efficient on scale. I upvoted the other solutions. But this is not code golf, it's SO. If we want people to not just copy and paste code they don't understand, we need to provide answers they can understand. And these days dplyr is what most people understand.Haley
no argument -- and I'm not going near the flame wars about "tidyverse vs. the rest of R"Ng

© 2022 - 2024 — McMap. All rights reserved.