How to compute rolling covariance more efficiently
Asked Answered
P

4

11

I am trying to compute a rolling covariance between a set of data (each column of my x variable) and one other (y variable) in R. I thought I could use one of the apply functions, but can not find how to roll two set of inputs at the same time. Here is what I tried :

 set.seed(1)
 x<-matrix(rnorm(500),nrow=100,ncol=5)
 y<-rnorm(100)
 rollapply(x,width=5,FUN= function(x) {cov(x,y)})
 z<-cbind(x,y)
 rollapply(z,width=5, FUN=function(x){cov(z,z[,6])})

But none is doing what I would like. One solution I found is to use a for loop, but wondering if I can be more efficient in R than :

dResult<-matrix(nrow=96,ncol=5)
for(iLine in 1:96){
    for(iCol in 1:5){
        dResult[iLine,iCol]=cov(x[iLine:(iLine+4),iCol],y[iLine:(iLine+4)])
    }
}

which gives me the expected result :

head(dResult)


           [,1]       [,2]        [,3]        [,4]        [,5]
[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533
Poundfoolish answered 28/7, 2016 at 19:21 Comment(1)
Nice job on a thorough first post.Nonprofessional
A
8
set.seed(1)
x<-as.data.frame(matrix(rnorm(500),nrow=100,ncol=5))
y<-rnorm(100)

library(zoo)

covResult = sapply(x,function(alpha) {

cov_value = rollapply(cbind(alpha,y),width=5,FUN = function(beta) cov(beta[,1],beta[,2]),by.column=FALSE,align="right") 

return(cov_value)

})

head(covResult)
#              V1         V2          V3          V4          V5
#[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
#[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
#[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
#[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
#[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
#[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533

Also check out:

library(PerformanceAnalytics)
?chart.rollingCorrelation
Abloom answered 28/7, 2016 at 20:18 Comment(2)
This is exactly what I was trying to do in my second attempt with the z variable, but my command of the apply functions is still too limited. Thanks a lot !Poundfoolish
If I was right in my comment to @Stibu , I guess this one should be also be faster than the for loop I didPoundfoolish
P
8

This is a solution with rollapply() and sapply():

sapply(1:5, function(j) rollapply(1:100, 5, function(i) cov(x[i, j], y[i])))

I think that it is more readable and more R-ish than the solution with for-loops, but I checked with microbenchmark and it seems to be slower.

Pancreas answered 28/7, 2016 at 20:14 Comment(2)
Oh I see my error (my R is still busy) well done! +1Ventriloquize
It is working indeed, thanks. I guess we are not gaining time as you are somehow creating two loops on indices here as well, not really applying the cov function to the x and y data directly, as you could maybe with mapplyPoundfoolish
A
8
set.seed(1)
x<-as.data.frame(matrix(rnorm(500),nrow=100,ncol=5))
y<-rnorm(100)

library(zoo)

covResult = sapply(x,function(alpha) {

cov_value = rollapply(cbind(alpha,y),width=5,FUN = function(beta) cov(beta[,1],beta[,2]),by.column=FALSE,align="right") 

return(cov_value)

})

head(covResult)
#              V1         V2          V3          V4          V5
#[1,]  0.32056460 0.05281386 -1.13283586 -0.01741274 -0.01464430
#[2,] -0.03246014 0.78631603 -0.34309778  0.29919297 -0.22243572
#[3,] -0.16239479 0.56372428 -0.27476604  0.39007645  0.05461355
#[4,] -0.56764687 0.09847672  0.11204244  0.78044096 -0.01980684
#[5,] -0.43081539 0.01904417  0.01282632  0.35550327  0.31062580
#[6,] -0.28890607 0.03967327  0.58307743  0.15055881  0.60704533

Also check out:

library(PerformanceAnalytics)
?chart.rollingCorrelation
Abloom answered 28/7, 2016 at 20:18 Comment(2)
This is exactly what I was trying to do in my second attempt with the z variable, but my command of the apply functions is still too limited. Thanks a lot !Poundfoolish
If I was right in my comment to @Stibu , I guess this one should be also be faster than the for loop I didPoundfoolish
G
3

If you need something faster, and you do not need any of the non-default arguments to cov, you can use TTR::runCov. Note that it pads with leading NA by default.

The speed difference will matter more on larger data. Here's an example of how to use it:

cov_joshua <- function() {
  apply(x, 2, function(x, y) TTR::runCov(x, y, 5), y = y)
}

And here's a comparison to the currently accepted answer using the small dataset the OP provided:

cov_osssan <- function() {
  f <- function(b) cov(b[,1], b[,2])
  apply(x, 2, function(a) {
    rollapplyr(cbind(a,y), width=5, FUN = f, by.column=FALSE)
  })
}
require(zoo)  # for cov_osssan
require(microbenchmark)
set.seed(1)
nr <- 100
nc <- 5
x <- matrix(rnorm(nc*nr),nrow=nr,ncol=nc)
y <- rnorm(nr)
microbenchmark(cov_osssan(), cov_joshua())
# Unit: milliseconds
#          expr       min        lq    median       uq      max neval
#  cov_osssan() 22.881253 24.569906 25.625623 27.44348 32.81344   100
#  cov_joshua()  5.841422  6.170189  6.706466  7.47609 31.24717   100
all.equal(cov_osssan(), cov_joshua()[-(1:4),])  # rm leading NA
# [1] TRUE

Now, using a larger data set:

system.time(cov_joshua())
#    user  system elapsed 
#   2.117   0.032   2.158 
system.time(cov_osssan())
# ^C
# Timing stopped at: 144.957 0.36 145.491 

I got tired of waiting (after ~2.5 minutes) for cov_osssan to complete.

Glean answered 23/11, 2016 at 13:29 Comment(0)
V
1

Right now I'm running some long simulations, so cant use R, but reckon this should work it. The outer apply by columns will take the column, pass it to rollapply, where it will be used to do the rolling window covariance with y. Hopefully :D

apply(x,2,function(x) rollapply(x,width=5,function(z) cov(x,y)))
Ventriloquize answered 28/7, 2016 at 20:8 Comment(1)
This does not work. You are always calculating the covariance of the complete vectors x and y inside rollaply(). Therefore, each column contains the same value repeated 96 times.Pancreas

© 2022 - 2024 — McMap. All rights reserved.