I have example data like below, where for a unit, I have multiple treatments, and multiple measurements for each treatment in before and after periods. I want to do a before-after comparison, so I gather
, and spread
as shown below to achieve the desired output.
My question is: can this be done with one pivot_
command? I have been trying to work out if a properly constructed spec
can achieve this, but have not succeeded yet. One such attempt is below.
I think I would accept either a way to make it work, or a clear explanation of how spec
or pivoting in general works that explains why this is not possible. From the pivoting vignette I think I understand that:
.name
contains unique column names from the input table when pivoting longer.value
contains new column names that you want to have in the output when pivoting longer
However, I do not know what the additional columns in spec
mean or when they are needed. I hoped that my spec
would understand that "before"
values of period
should go into a column named before
, but apparently it doesn't work like that.
library(tidyverse) # tidyr 0.8.99.9000
tbl <- tibble(
obsv_unit = rep(1:2, each = 4),
treatment = rep(c("A", "B"), each = 2, times = 2),
period = rep(c("before", "after"), times = 4),
measure1 = 1:8,
measure2 = 11:18
)
tbl
#> # A tibble: 8 x 5
#> obsv_unit treatment period measure1 measure2
#> <int> <chr> <chr> <int> <int>
#> 1 1 A before 1 11
#> 2 1 A after 2 12
#> 3 1 B before 3 13
#> 4 1 B after 4 14
#> 5 2 A before 5 15
#> 6 2 A after 6 16
#> 7 2 B before 7 17
#> 8 2 B after 8 18
tbl %>%
gather("measure", "value", starts_with("measure")) %>%
spread(period, value)
#> # A tibble: 8 x 5
#> obsv_unit treatment measure after before
#> <int> <chr> <chr> <int> <int>
#> 1 1 A measure1 2 1
#> 2 1 A measure2 12 11
#> 3 1 B measure1 4 3
#> 4 1 B measure2 14 13
#> 5 2 A measure1 6 5
#> 6 2 A measure2 16 15
#> 7 2 B measure1 8 7
#> 8 2 B measure2 18 17
spec <- tibble(
`.name` = c("measure1", "measure2", "measure1", "measure2"),
`.value` = c("before", "before", "after", "after"),
period = c("before", "before", "after", "after")
)
tbl %>% pivot_wider(spec = spec)
#> Expected a vector, not NULL
tbl %>% pivot_longer(spec = spec)
#> Failed to create output due to bad names.
#> Choose another strategy with `names_repair`
reshape2::recast
https://mcmap.net/q/2034242/-pivot-longer-multiple-columns-while-pivot-wider-others. i.e.recast(tbl, measure.var = c("measure1", "measure2"), obsv_unit + treatment + variable ~ period)
– Susumutidyr
(I need to stress to my very limited knowledge) for this kind of question. Again, to my knowledge,tidyr
andreshape2
are two separate packages, where the former one is still in active development and the latter one is "retired" (at least shown on their GitHub page), so I'm not sure ifreshape2
is "superseded" bytidyr
(though the author ofreshape2
recommendstidyr
) – Susumudata.table
), though in reality there's only so much "memory-efficiency" one can do when reshaping data (in either direction). (FYI, speaking of superseding,gather
andspread
have been superseded for several years now, in favor of thepivot_*
functions. I made the transition a couple of years ago, and they are much more powerful with very little learning required.) – Sello