Grouped moving average in r

Asked 7/2, 2016 at 2:26 Answered 8/5, 2016 at 22:24

I'm trying to calculate a moving average in r over a particular field BUT I need this moving average to be grouped by two or more other fields. The purpose of this new average is for predictive analysis so I need it to be trailing as well. Any variables that do not have enough values to be averaged (such as student J) would ideally give either NA or its original Score value.

I've been trying rollapply and data.table and am having no luck!

I've provided the table of data and two moving averages (AVG2 with k=2 and AVG3 with k=3) to show exactly what I'm after. The moving average is on Score and the variables to group over are school, Student and area. Please help!

   no   school  Student area    Score **AVG2**  **AVG3**
   1    I       S       A       5      NA       NA
   2    B       S       A       2      NA       NA
   3    B       S       A       7      NA       NA
   4    B       O       A       3      NA       NA
   5    B       O       B       9      NA       NA
   6    I       O       A       6      NA       NA
   7    I       O       B       3      NA       NA
   8    I       S       A       7      NA       NA
   9    I       O       A       1      NA       NA
   10   B       S       A       7      4.5      NA
   11   I       S       A       3      NA       NA
   12   I       O       A       8      3.5      NA
   13   B       S       A       3      7        5.33
   14   I       O       A       4      4.5      5
   15   B       O       A       1      NA       NA
   16   I       S       A       9      5        5
   17   B       S       A       4      5        5.67
   18   B       O       A       6      2        NA
   19   I       S       A       3      6        6.33
   20   I       O       B       8      NA       NA
   21   B       S       A       3      3.5      4.67
   22   I       O       A       4      6        4.33
   23   B       O       A       1      3.5      3.33
   24   I       S       A       9      6        5
   25   B       S       A       4      3.5      3.33
   26   B       O       A       6      3.5      2.67
   27   I       J       A       6      NA       NA

here is the code to recreate the initial table in r:

school <- c('I','B','B','B','B','I','I','I','I','B','I','I','B','I','B','I','B','B','I','I','B','I','B','I','B','B','I')
Student <- c('S','S','S','O','O','O','O','S','O','S','S','O','S','O','O','S','S','O','S','O','S','O','O','S','S','O','J')
area <- c('A','A','A','A','B','A','B','A','A','A','A','A','A','A','A','A','A','A','A','B','A','A','A','A','A','A','A')
Score <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1,9,4,6,3,8,3,4,1,9,4,6,6)
data.frame(school, Student, area,  Score)

Lupine answered 7/2, 2016 at 2:26 Comment(3)

Can you provide a sample R code, of how you have calculated AVG2 And AVG3? – Epigraphy 7/2, 2016 at 2:39

First, instead of providing a picture of your data please provide your data in a format that we can easily read into R, e.g. posting the output of dput(dat) or at least head(dat) in your question. Second, please share the desired output of your grouped moving average for this data. Third, please share the code you've tried so far. – Gorden 7/2, 2016 at 2:49

I used excel to manually calculate AVG2 and AVG3 since I've had no luck with r. The desired output is the displayed AVG2 and AVG3 – Lupine 7/2, 2016 at 7:50

Here is a rollapply solution. Note that it appears that you want the average of the prior two or three rows in the same group, i.e. excluding the data on the current row.

library(zoo)

roll <- function(x, n) { 
   if (length(x) <= n) NA 
   else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Score, school, Student, FUN = function(x) roll(x, 2)),
              AVG3 = ave(Score, school, Student, FUN = function(x) roll(x, 3)))

giving:

   school Student Score AVG2     AVG3
1       I       S     5   NA       NA
2       B       S     2   NA       NA
3       B       S     7   NA       NA
4       B       O     3   NA       NA
5       B       O     9   NA       NA
6       I       O     6   NA       NA
7       I       O     3   NA       NA
8       I       S     7   NA       NA
9       I       O     1  4.5       NA
10      B       S     7  4.5       NA
11      I       S     3  6.0       NA
12      I       O     8  2.0 3.333333
13      B       S     3  7.0 5.333333
14      I       O     4  4.5 4.000000
15      B       O     1  6.0       NA
16      I       S     9  5.0 5.000000
17      B       S     4  5.0 5.666667
18      B       O     6  5.0 4.333333
19      I       S     3  6.0 6.333333
20      I       O     8  6.0 4.333333
21      B       S     3  3.5 4.666667
22      I       O     4  6.0 6.666667
23      B       O     1  3.5 5.333333
24      I       S     9  6.0 5.000000
25      B       S     4  3.5 3.333333
26      B       O     6  3.5 2.666667
27      I       J     6   NA       NA

Update: Fixed roll.

Transgression answered 9/2, 2016 at 5:34 Comment(8)

How can I change this if I have another column to group by? I tried this method on a different dataset with more grouping fields and was getting the error: Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument – Lupine 9/2, 2016 at 20:47

Say there is a third grouping field, 'area' for the above example. So the moving average will now need to be grouped over school, Student and also area. How would the code change to account for 3 groups rather than 2? – Lupine 9/2, 2016 at 23:41

Thanks for the response. Unfortunately when I do as you suggest, I get the following error: Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument – Lupine 10/2, 2016 at 7:20

I've edited the initial question to now include the extra 'area' field to be grouped over alongside new values for AVG2 and AVG3. If you could show me how you can do it I'd very much appreciate it! – Lupine 10/2, 2016 at 12:39

Try: AVG2 = ave(Score, school, Student, area, FUN = function(x) roll(x, 2)). This is erroring for me now. n = 2 doesn't work, n = 3 does, n = 4 does, n = 5 doesn't – Lupine 10/2, 2016 at 12:54

But your AVG2 isn't giving the same result as what I need. It has n = 3, rather than n = 2. Your AVG2 and AVG3 are exactly the same – Lupine 10/2, 2016 at 13:0

OK. Have fixed roll. < should have been <= . – Transgression 10/2, 2016 at 13:6

Thank you so much! You are a hero – Lupine 10/2, 2016 at 13:9

You can try solving the problem using dplyr and TTR but for student J from school I it is not possible to calculate a moving average as there's only one measurement.

AVG2 caluculated with stats:filter gives the result you wanted to have, but I also added AVG2b calculated with TTR::SMA to show a simple moving average calculation, where the current measurement is also taken into account.

library(dplyr)
library(TTR)

df <- data.frame(school, Student, Score)
df$AVG2 <- NA
df$AVG2b <- NA
df[!(df$school=="I" & df$Student=="J"),] <- df[!(df$school=="I" & df$Student=="J"),] %>% 
  group_by(school, Student) %>% 
  mutate(AVG2 = stats::filter(Score, c(0, 0.5, 0.5), sides = 1 ), AVG2b = SMA(Score, n= 2)) 

    > df
   school Student Score AVG2 AVG2b
1       I       S     5   NA    NA
2       B       S     2   NA    NA
3       B       S     7   NA   4.5
4       B       O     3   NA    NA
5       B       O     9   NA   6.0
6       I       O     6   NA    NA
7       I       O     3   NA   4.5
8       I       S     7   NA   6.0
9       I       O     1  4.5   2.0
10      B       S     7  4.5   7.0
...

Me answered 7/2, 2016 at 12:19 Comment(8)

My mistake, the manually calculated values were incorrect. I've amended them, do your results match now? – Lupine 7/2, 2016 at 12:34

No, but you can also check yourself, or is the code not working for you? If your are using a simple moving average I think the value for the third row for AVG2 should be 4.5 and for the 5th row 6.0 and so on – Me 7/2, 2016 at 12:39

Im not sure this gives the trailing moving average that I'm after, the first 8 rows should be NA since there hasn't been a third value for both groups until row 9 (school I and student O). – Lupine 7/2, 2016 at 12:54

I think now I understood how you are calculating the AVG2: For the third measurement of a group you take the previous 2 and take the mean from them. Moving average takes also the current measurement into account, if you don't want this you can use stats::filter(Score, c(0, 0.5, 0.5), sides = 1) instead of the SMA function... – Me 7/2, 2016 at 17:27

Is there any way I can do this and get r to just output student J's actual score since he only has 1 value. Maybe so that the first two values of any two group combo outputs as the actual score until the third appears and the averaging is applied? Appreciate your help so far. – Lupine 7/2, 2016 at 20:50

What is the code to do this for a moving average other than 2? A moving window of 3 or 4 for example? – Lupine 8/2, 2016 at 10:17

I edited my answer a little bit so that all the original information is kept in df and then you can set the value for the group which is too small to the score (subsetting and replace). For a moving average with a larger window, you should modify the filter in stats::filter accordingly, see ?stats::filter for more help. – Me 8/2, 2016 at 23:18

Thanks for that. Do you have a more generic way to handle this when you are averaging over a group that is too small? The dataset I gave was a subset of a much large dataset that will have multiple grouping combinations that are too small. If there was a way to avoid the "Error: 'filter' is longer than time series" and just have 'NA's it would make my life so much easier!! – Lupine 9/2, 2016 at 4:33

Here is a rollapply solution. Note that it appears that you want the average of the prior two or three rows in the same group, i.e. excluding the data on the current row.

library(zoo)

roll <- function(x, n) { 
   if (length(x) <= n) NA 
   else rollapply(x, list(-seq(n)), mean, fill = NA)
}
transform(DF, AVG2 = ave(Score, school, Student, FUN = function(x) roll(x, 2)),
              AVG3 = ave(Score, school, Student, FUN = function(x) roll(x, 3)))

giving:

   school Student Score AVG2     AVG3
1       I       S     5   NA       NA
2       B       S     2   NA       NA
3       B       S     7   NA       NA
4       B       O     3   NA       NA
5       B       O     9   NA       NA
6       I       O     6   NA       NA
7       I       O     3   NA       NA
8       I       S     7   NA       NA
9       I       O     1  4.5       NA
10      B       S     7  4.5       NA
11      I       S     3  6.0       NA
12      I       O     8  2.0 3.333333
13      B       S     3  7.0 5.333333
14      I       O     4  4.5 4.000000
15      B       O     1  6.0       NA
16      I       S     9  5.0 5.000000
17      B       S     4  5.0 5.666667
18      B       O     6  5.0 4.333333
19      I       S     3  6.0 6.333333
20      I       O     8  6.0 4.333333
21      B       S     3  3.5 4.666667
22      I       O     4  6.0 6.666667
23      B       O     1  3.5 5.333333
24      I       S     9  6.0 5.000000
25      B       S     4  3.5 3.333333
26      B       O     6  3.5 2.666667
27      I       J     6   NA       NA

Update: Fixed roll.

Transgression answered 9/2, 2016 at 5:34 Comment(8)

Try: AVG2 = ave(Score, school, Student, area, FUN = function(x) roll(x, 2)). This is erroring for me now. n = 2 doesn't work, n = 3 does, n = 4 does, n = 5 doesn't – Lupine 10/2, 2016 at 12:54

But your AVG2 isn't giving the same result as what I need. It has n = 3, rather than n = 2. Your AVG2 and AVG3 are exactly the same – Lupine 10/2, 2016 at 13:0

OK. Have fixed roll. < should have been <= . – Transgression 10/2, 2016 at 13:6

Thank you so much! You are a hero – Lupine 10/2, 2016 at 13:9

Here is AVG2 calculation with data.table, which is faster compared to other approaches:

library(data.table)
dt <- data.table(df)
setkey(dt, school, Student, area)
dt[, c("start", "len") := .(ifelse(.I + 1 > .I[.N], 0, .I +1), pmax(pmin(1, .I[.N] - .I -1), 0)), by = .(school, Student, area)][
    , AVG2 := mean(dt$Score[start:(start+len)]), by = 1:nrow(dt)]
res$AVG2[res$len == 0] <- NA

Cocker answered 8/5, 2016 at 22:24 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags