How to pivot in two directions at once?
Asked Answered
C

3

5

I have example data like below, where for a unit, I have multiple treatments, and multiple measurements for each treatment in before and after periods. I want to do a before-after comparison, so I gather, and spread as shown below to achieve the desired output.

My question is: can this be done with one pivot_ command? I have been trying to work out if a properly constructed spec can achieve this, but have not succeeded yet. One such attempt is below.

I think I would accept either a way to make it work, or a clear explanation of how spec or pivoting in general works that explains why this is not possible. From the pivoting vignette I think I understand that:

  • .name contains unique column names from the input table when pivoting longer
  • .value contains new column names that you want to have in the output when pivoting longer

However, I do not know what the additional columns in spec mean or when they are needed. I hoped that my spec would understand that "before" values of period should go into a column named before, but apparently it doesn't work like that.

library(tidyverse) # tidyr 0.8.99.9000
tbl <- tibble(
  obsv_unit = rep(1:2, each = 4),
  treatment = rep(c("A", "B"), each = 2, times = 2),
  period = rep(c("before", "after"), times = 4),
  measure1 = 1:8,
  measure2 = 11:18
)
tbl
#> # A tibble: 8 x 5
#>   obsv_unit treatment period measure1 measure2
#>       <int> <chr>     <chr>     <int>    <int>
#> 1         1 A         before        1       11
#> 2         1 A         after         2       12
#> 3         1 B         before        3       13
#> 4         1 B         after         4       14
#> 5         2 A         before        5       15
#> 6         2 A         after         6       16
#> 7         2 B         before        7       17
#> 8         2 B         after         8       18

tbl %>%
  gather("measure", "value", starts_with("measure")) %>%
  spread(period, value)
#> # A tibble: 8 x 5
#>   obsv_unit treatment measure  after before
#>       <int> <chr>     <chr>    <int>  <int>
#> 1         1 A         measure1     2      1
#> 2         1 A         measure2    12     11
#> 3         1 B         measure1     4      3
#> 4         1 B         measure2    14     13
#> 5         2 A         measure1     6      5
#> 6         2 A         measure2    16     15
#> 7         2 B         measure1     8      7
#> 8         2 B         measure2    18     17

spec <- tibble(
  `.name` = c("measure1", "measure2", "measure1", "measure2"),
  `.value` = c("before", "before", "after", "after"),
  period = c("before", "before", "after", "after")
)

tbl %>% pivot_wider(spec = spec)
#> Expected a vector, not NULL
tbl %>% pivot_longer(spec = spec)
#> Failed to create output due to bad names.
#> Choose another strategy with `names_repair`
Casebound answered 8/8, 2019 at 21:15 Comment(5)
There's an answer to this using reshape2::recast https://mcmap.net/q/2034242/-pivot-longer-multiple-columns-while-pivot-wider-others. i.e. recast(tbl, measure.var = c("measure1", "measure2"), obsv_unit + treatment + variable ~ period)Susumu
@Susumu thanks for your comment! I looked up reshape2, and I saw that it's been superceded by tidyr. Do you know if tidyr still has a similar/equivalent functionality, or has it been left with recast2?Amitosis
@Amitosis To my knowledge you'll need two separate pivoting operations in tidyr (I need to stress to my very limited knowledge) for this kind of question. Again, to my knowledge, tidyr and reshape2 are two separate packages, where the former one is still in active development and the latter one is "retired" (at least shown on their GitHub page), so I'm not sure if reshape2 is "superseded" by tidyr (though the author of reshape2 recommends tidyr)Susumu
Besides elegance and code-golf, is there a particular reason you're trying to combine both pivots into a single step? If your real data is huge, there are other things you might consider (perhaps data.table), though in reality there's only so much "memory-efficiency" one can do when reshaping data (in either direction). (FYI, speaking of superseding, gather and spread have been superseded for several years now, in favor of the pivot_* functions. I made the transition a couple of years ago, and they are much more powerful with very little learning required.)Sello
@Sello I can't speak for OP, but for me - overwhelming amounts of laziness hahaha :-PAmitosis
D
3

The purpose of the spec is to handle situations with irregular patterns in column names (in wide format) and values (in long format). It is only useful for overcoming difficulties where a regular expression + glue cannot specify the before and after. If you look at the examples in the pivoting vignette, it doesn't accomplish anything that couldn't be done with rename or mutate and a single pivot.

spec is never proposed as a method to pivot in two directions at once.

In conclusion, you need to use both pivot_longer and pivot_wider to accomplish this transformation.

Dethrone answered 18/8, 2023 at 3:48 Comment(0)
B
3

This might not be satisfying to you but you can do this in one operation with reframe() :

tbl %>% 
  reframe(
    .by = c(obsv_unit, treatment), 
    measure = c("measure1", 
    "measure2"), 
    after = c(measure1[period == "after"], measure2[period == 
    "after"]), 
    before = c(measure1[period == "before"], measure2[period == 
    "before"]),
  )
#> # A tibble: 8 × 5
#>   obsv_unit treatment measure  after before
#>       <int> <chr>     <chr>    <int>  <int>
#> 1         1 A         measure1     2      1
#> 2         1 A         measure2    12     11
#> 3         1 B         measure1     4      3
#> 4         1 B         measure2    14     13
#> 5         2 A         measure1     6      5
#> 6         2 A         measure2    16     15
#> 7         2 B         measure1     8      7
#> 8         2 B         measure2    18     17

Pivoting can be seen as a special case of aggregation.

You can also do the following for the same effect, and you might extract an helper as a spec builder :

tbl %>% 
  reframe(
    .by = c(obsv_unit, treatment),
    rownames_to_column(as.data.frame(t(column_to_rownames(across(everything()), "period"))), "measure")
  )
Bruiser answered 21/8, 2023 at 5:0 Comment(1)
This was my favourite answer, but I thought (reading through the question again) because the question was asking about pivot_*(), I couldn't award the bounty to someone who didn't use pivot_longer(). Though kudos to you @moodymudskipper! :-) reframe is underutilised, and I think didn't exist when this question was createdAmitosis
A
1

In my understanding, it is not possible. If you look at the desired data transformation, the data basically stays in the wide format, just wide within another variable.

The pivot_ functions can either create a long data format or a wide data format. The pivoting vignette states that "the two operations [pivot_longer and pivot_wider] are symmetric". This means that you need an intermediary data transformation in the long format. To this long format, both the transformation to the original wide format as well as the new wide format are symmetric.

Transformation to the intermediary long format:

tbl %>% 
  pivot_longer(
    cols = -c(obsv_unit, treatment, period),
    names_to = "measure",
    values_to = "treatment_value"
  )

# A tibble: 16 × 5
   obsv_unit treatment period measure  treatment_value
       <int> <chr>     <chr>  <chr>              <int>
 1         1 A         before measure1               1
 2         1 A         before measure2              11
 3         1 A         after  measure1               2
 4         1 A         after  measure2              12
 5         1 B         before measure1               3
 6         1 B         before measure2              13
 7         1 B         after  measure1               4
 8         1 B         after  measure2              14
 9         2 A         before measure1               5
10         2 A         before measure2              15
11         2 A         after  measure1               6
12         2 A         after  measure2              16
13         2 B         before measure1               7
14         2 B         before measure2              17
15         2 B         after  measure1               8
16         2 B         after  measure2              18

Transformation to the new wide format:

tbl %>% 
  pivot_longer(
    cols = -c(obsv_unit, treatment, period),
    names_to = "measure",
    values_to = "treatment_value"
  ) %>% 
  pivot_wider(
    names_from = period,
    values_from = treatment_value
  )

# A tibble: 8 × 5
  obsv_unit treatment measure  before after
      <int> <chr>     <chr>     <int> <int>
1         1 A         measure1      1     2
2         1 A         measure2     11    12
3         1 B         measure1      3     4
4         1 B         measure2     13    14
5         2 A         measure1      5     6
6         2 A         measure2     15    16
7         2 B         measure1      7     8
8         2 B         measure2     17    18

The additional column in the spec is the variable that is present in the long format, but not in the wide format of the data.

In the case of the intermediary long format, it is measure:

# A tibble: 2 × 3
  .name    .value          measure 
  <chr>    <chr>           <chr>   
1 measure1 treatment_value measure1
2 measure2 treatment_value measure2

This column is the value of the names_to argument in pivot_longer and of the names_from argument in pivot_wider (if you want to reverse the pivot_longer transformation).

Antecedency answered 16/8, 2023 at 10:0 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.