applying rolling mean by group in R
Asked Answered
A

2

11

I'm an R newbie and I'm having a lot of trouble doing something that is probably very simple. I have a big dataset split up into groups by country code, and I want to take a 3-month rolling average of a price index, by country, and then put it into a new column that matches up to the appropriate month. I've been trying to use rollmean like this with no success (code and error messages below):

> leader$last3<-tapply(leader, leader$ccode, 
    function(x) rollmean(leader$GI_delta, 3, na.pad=T))
Error in tapply(leader, leader$ccode, function(x) rollmean(leader$GI_delta,  : 
  arguments must have same length

> leader$last3<-ddply(leader, .(ccode), 
    rollmean(GI_delta, 3, na.pad=T))

Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
  .fun is not a function.

Any help would be much appreciated!

Albarran answered 10/3, 2012 at 6:42 Comment(0)
L
14

If you want to make a new column, then try using ave. It resembles tapply but returns a vector of the same length as its first argument. My experience is that it is a lot faster than ddply:

require(zoo)
leader$last3<-ave(leader$GI_delta, leader$ccode, 
                         FUN= function(x) rollmean(x, k=3, na.pad=T) )
Lastex answered 10/3, 2012 at 12:12 Comment(0)
P
5

In your first attempt, your function does not use its x argument, and always returns the same thing (a vector with the wrong size). In addition, the first argument, should be a vector. Lastly, tapply returns a list of vectors: you cannot put the result directly into a data.frame.

library(zoo)
n <- 10
leader <- data.frame(
  ccode = rep(LETTERS[1:3],each=n),
  GI_delta = rnorm(3*n)
)
tapply(
  leader$GI_delta, 
  leader$ccode, 
  function(x) rollmean(x, 3, na.pad=TRUE)
)

In your second example, the third argument of plyr should be a function, not an expression. If you want to use an expression, you can use summarize or transform as a function (summarize returns a 1-row data.frame for each value of ccode, while transform keeps the number of rows unchanged), and put the expressions as further arguments.

library(plyr)
ddply(
  leader, "ccode",
  transform,
  last3 = rollmean( GI_delta, 3, align="right", na.pad=TRUE )
)
Pomade answered 10/3, 2012 at 7:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.