Another approach (and IMO the recommended approach) using dplyr
would be to first reshape or melt your data into a tidy data format before summarizing the values from each wave.
In detail, this process would involve:
- Reshape your data to long format (
tidyr::gather
)
- Identify which variables belong to each "wave"
- Summarize values for each wave
- Reshape your data back to wide format (
tidyr::spread
)
In your example, this would look like the following:
library(tidyverse)
mat <- matrix(runif(1000, 1, 10), ncol = 100)
df <- data.frame(mat)
dim(df)
df %>%
dplyr::mutate(id = dplyr::row_number()) %>%
# reshape to "tidy data" or long format
tidyr::gather(varname, value, -id) %>%
# identify which variables belong to which "wave"
dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')),
wave = floor((varnum-1)/10)+1) %>%
# summarize your value for each wave
dplyr::group_by(id, wave) %>%
dplyr::summarise(avg = sum(value)/n()) %>%
# reshape back to "wide" format
tidyr::spread(wave, avg, sep='_') %>%
dplyr::ungroup()
With the following output:
# A tibble: 10 x 11
id wave_1 wave_2 wave_3 wave_4 wave_5 wave_6 wave_7 wave_8 wave_9 wave_10
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 6.24 4.49 5.85 5.43 5.98 6.04 4.83 6.92 5.43 5.52
2 2 5.16 6.82 5.76 6.66 6.21 5.41 4.58 5.06 5.81 6.93
3 3 7.23 6.28 5.40 5.70 5.13 6.27 5.55 5.84 6.74 5.94
4 4 5.27 4.79 4.39 6.85 5.31 6.01 6.15 3.31 5.73 5.63
5 5 6.48 5.16 5.20 4.71 5.87 4.44 6.40 5.00 5.90 3.78
6 6 4.18 4.64 5.49 5.47 5.75 6.35 4.34 5.66 5.34 6.57
7 7 4.97 4.09 6.17 5.78 5.87 6.47 4.96 4.39 5.99 5.35
8 8 5.50 7.21 5.43 5.15 4.56 5.00 4.86 5.72 6.41 5.65
9 9 5.27 5.71 5.23 5.44 5.12 5.40 5.38 6.05 5.41 5.30
10 10 5.95 4.58 6.52 5.46 7.63 5.56 5.82 7.03 5.68 5.38
This could be joined back to your original data to match the example you gave (which used mutate
) as follows:
df %>%
dplyr::mutate(id = dplyr::row_number()) %>%
tidyr::gather(varname, value, -id) %>%
dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')),
wave = floor((varnum-1)/10)+1) %>%
dplyr::group_by(id, wave) %>%
dplyr::summarise(avg = sum(value)/n()) %>%
tidyr::spread(wave, avg, sep='_') %>%
dplyr::ungroup() %>%
dplyr::right_join(df %>% # <-- join back to original data
dplyr::mutate(id = dplyr::row_number()),
by = 'id')
One nice aspect to this approach is that you can inspect your data to confirm that you are correctly assigning variables to "wave"s.
df %>%
dplyr::mutate(id = dplyr::row_number()) %>%
tidyr::gather(varname, value, -id) %>%
dplyr::mutate(varnum = as.integer(stringr::str_extract(varname, pattern = '\\d+')),
wave = floor((varnum-1)/10)+1) %>%
dplyr::distinct(varname, varnum, wave) %>%
head()
which produces:
varname varnum wave
1 X1 1 1
2 X2 2 1
3 X3 3 1
4 X4 4 1
5 X5 5 1
6 X6 6 1