reshaping k columns to 2 columns representing sequential pairs of the values of the k variables

Asked 3/7, 2017 at 22:58 Answered 4/7, 2017 at 0:9

I have a data frame like this:

id y1 y2 y3 y4  
--+--+--+--+--
a |12|13|14|  
b |12|18|  |
c |13|  |  |
d |13|14|15|16

I want to reshape in such a way that I end with two columns. The above example would then become:

id from to  
--+----+--- 
a |12  |13  
a |13  |14  
a |14  |
b |12  |18
b |18  |  
c |13  |
d |13  |14  
d |14  |15  
d |15  |16

Each id has a 'from' and a 'to' per pair of year values.
Does anybody know of an easy way to do this? I tried using reshape2. I also looked at Combine Multiple Columns Into Tidy Data but I think my case is different.

Porosity answered 3/7, 2017 at 22:58 Comment(0)

You can use lapply to loop over the pairs of columns and rbind to union them:

do.call(rbind,
        lapply(2:(length(df)-1), 
               function(x) setNames(df[!is.na(df[,x]),c(1,x,x+1)], 
                                    c("id", "from", "to"))))
   id from to
1   a   12 13
2   b   12 18
3   c   13 NA
4   d   13 14
11  a   13 14
21  b   18 NA
41  d   14 15
12  a   14 NA
42  d   15 16

Basidiospore answered 3/7, 2017 at 23:24 Comment(0)

A solution uses dplyr and tidyr. dt2 is the final output.

# Create example data frame
dt <- data.frame(id = c("a", "b", "c", "d"),
                 y1 = c(12, 12, 13, 13),
                 y2 = c(13, 18, NA, 14),
                 y3 = c(14, NA, NA, 15),
                 y4 = c(NA, NA, NA, 16),
                 stringsAsFactors = FALSE)

# Load packages
library(dplyr)
library(tidyr)

# Process the data
dt2 <- dt %>%
  gather(STEP, from, -id) %>%
  drop_na(from) %>%
  arrange(id, STEP) %>%
  group_by(id) %>%
  mutate(to = lead(from)) %>%
  select(-STEP)

Diggs answered 3/7, 2017 at 23:32 Comment(5)

Neat, though producing an extra row – Basidiospore 3/7, 2017 at 23:49

@Basidiospore Thanks for your comment. I know the last row is not in OP's desired output, but I feel it should be there. At least I don't find a good logic to exclude the last row because the last row of id a, b, and c are also there. – Diggs 3/7, 2017 at 23:54

It shouldn't, as its to value is generated by lead. A rearrangement:

df %>% gather(var, from, -id) %>%      arrange(id, var) %>%      group_by(id) %>%     mutate(to = lead(from)) %>%      #slice(-n()) %>%      filter(!is.na(from), var != 'y4') %>% select(-var)

– Stabler 4/7, 2017 at 0:4

If id=="b" has an 18-NA row, I can't see how id=="d" shouldn't have a 16-NA row. – Forsythe 4/7, 2017 at 0:7

@Stabler Thanks for your suggestion. Let's wait for OP's clarification on this. Currently, I agree with @thelatemail. If 18-NA is in id b, 16-NA should also be in id d. – Diggs 4/7, 2017 at 0:13

In base R, stack and shift everything back one row in each group. Using @ycw's example data, dt:

tmp <- na.omit(cbind(dt[1], stack(dt[-1])[-2]))
names(tmp)[2] <- "from"
tmp$to <- with(tmp, ave(from, id, FUN=function(x) c(tail(x,-1),NA) ))
tmp[order(tmp$id),]

#   id from to
#1   a   12 13
#5   a   13 14
#9   a   14 NA
#2   b   12 18
#6   b   18 NA
#3   c   13 NA
#4   d   13 14
#8   d   14 15
#12  d   15 16
#16  d   16 NA

In the world of data.table, the same logic applies. melt, then shift by= id:

library(data.table)
dt <- as.data.table(dt)

melt(dt, id.vars="id", value.name="from")[
  !is.na(from),-"variable"][, to := shift(from,1,type="lead"), by=id
][order(id)]

Forsythe answered 4/7, 2017 at 0:9 Comment(0)

Recommended topics

Hot tags