Applying a rolling window regression to an XTS series in R
Asked Answered
B

2

10

I have an xts of 1033 daily returns points for 5 currency pairs on which I want to run a rolling window regression, but rollapply is not working for my defined function which uses lm(). Here is my data:

> head(fxr)
                 USDZAR        USDEUR       USDGBP        USDCHF        USDCAD
2007-10-18 -0.005028709 -0.0064079963 -0.003878743 -0.0099537170 -0.0006153215
2007-10-19 -0.001544470  0.0014275520 -0.001842564  0.0023058211 -0.0111410271
2007-10-22  0.010878027  0.0086642116  0.010599365  0.0051899551  0.0173792230
2007-10-23 -0.022783987 -0.0075236355 -0.010804304 -0.0041668499 -0.0144788687
2007-10-24 -0.006561223  0.0008545792  0.001024275 -0.0004261666  0.0049525483
2007-10-25 -0.014788901 -0.0048523001 -0.001434280 -0.0050425302 -0.0046422944

> tail(fxr)
                 USDZAR       USDEUR       USDGBP       USDCHF        USDCAD
2012-02-10  0.018619309  0.007548205  0.005526184  0.006348533  0.0067151342
2012-02-13 -0.006449463 -0.001055966 -0.002206810 -0.001638002 -0.0016995755
2012-02-14  0.006320364  0.006843933  0.006605875  0.005992935  0.0007001751
2012-02-15 -0.001666872  0.004319096 -0.001568874  0.003686840 -0.0015009759
2012-02-16  0.006419616 -0.003401364 -0.005194817 -0.002709588 -0.0019044761
2012-02-17 -0.004339687 -0.003675992 -0.003319899 -0.003043481  0.0000000000

I can easily run an lm on it for the whole data set to model USDZAR against the other pairs:

> lm(USDZAR ~ ., data = fxr)$coefficients
  (Intercept)        USDEUR        USDGBP        USDCHF        USDCAD 
-1.309268e-05  5.575627e-01  1.664283e-01 -1.657206e-01  6.350490e-01 

However I want to run a rolling 62 day window to get the evolution of these coefficients over time, so I create a function dolm which does this:

> dolm
function(x) {
  return(lm(USDZAR ~ ., data = x)$coefficients)
}

However when I run rollapply on this I get the following:

> rollapply(fxr, 62, FUN = dolm)
Error in terms.formula(formula, data = data) : 
  '.' in formula and no 'data' argument

that is even though dolm(fxr) on its own works fine:

> dolm(fxr)
  (Intercept)        USDEUR        USDGBP        USDCHF        USDCAD 
-1.309268e-05  5.575627e-01  1.664283e-01 -1.657206e-01  6.350490e-01 

What's going on here? It seems to work fine if dolm is a simpler function for example mean:

> dolm <- edit(dolm)
> dolm
function(x) {
  return(mean(x))
}
> rollapply(fxr, 62, FUN = dolm)
                  USDZAR        USDEUR        USDGBP        USDCHF        USDCAD
2007-11-29 -1.766901e-04 -6.899297e-04  6.252596e-04 -1.155952e-03  7.021468e-04
2007-11-30 -1.266130e-04 -6.512204e-04  7.067767e-04 -1.098413e-03  7.247315e-04
2007-12-03  8.949942e-05 -6.406932e-04  6.637066e-04 -1.154806e-03  8.727564e-04
2007-12-04  2.042046e-04 -5.758493e-04  5.497422e-04 -1.116308e-03  7.124593e-04
2007-12-05  7.343586e-04 -4.899982e-04  6.161819e-04 -1.057904e-03  9.915495e-04

Any help much appreciated. Essentially what I want is to get the weightings for the regression of USDZAR ~ USDEUR + USDGBP + USDCHF + USDCAD over a rolling 62-day window.

Bethanie answered 19/2, 2012 at 16:48 Comment(0)
C
10

There are several problems here:

  • rollapply passes a matrix but lm requires a data.frame.
  • rollapply applies the function to each column separately unless we specify by.column=FALSE.
  • you may or may not want the result to be right aligned with the dates but if you do use rollapplyr :

1) Incorporating the above we have:

dolm <- function(x) coef(lm(USDZAR ~ ., data = as.data.frame(x))))
rollapplyr(fxr, 62, dolm, by.column = FALSE)

2) An alternative to the lm in the dolm above is to use lm.fit which directly works with matrices and is also faster:

dolm <- function(x) coef(lm.fit(cbind(Intercept = 1, x[,-1]), x[,1]))
Coprolalia answered 19/2, 2012 at 17:24 Comment(7)
awesome thanks. Yes I just worked it out too after much playing around. Silly me. by.column = FALSE of course ! Thanks very much. Was just reading your zoo doc btw. Great stuff. I guess where rollapply is a bit confusing is that while lm() works on the whole xts, it does not on parts of it returned by rollapply(). One could reasonably have expected rollapply to return another xts that would still work under lm() or am I missing something? Mea culpa on the by.column FALSE though. No excuses for that...Bethanie
What is missed is that rollapply is not part of xts but it is part of zoo and its dispatching rollapply.zoo .Coprolalia
thank you for clarifying this. yet: > fxr <- zoo(fxr) > class(fxr) [1] "zoo" > rollapply(fxr, 62, function(x) coef(lm(USDZAR ~ x, data = x)), by.column = FALSE) Error in model.frame.default(formula = USDZAR ~ x, data = x, drop.unused.levels = TRUE) : 'data' must be a data.frame, not a matrix or an array So we still have this problem. I understand...R has plenty of this kind of issue around but still. What we have here is lm works on the entire zoo object but does not work on rollapply subsets of it.Bethanie
This is simply user error. There is no reason to think that lm works with anything other than its documented to work with. Also lm is not generic in the data argument (maybe you feel it should have been) so there is no reason to think that particular packages can extend it although there do exist two packages -- dyn and dynlm -- that will each allow you to do linear regression (dyn also allows a number of other types of regression) with zoo objects but not matrices. If you do want to use matrices then lm.fit does that (as mentioned in my response).Coprolalia
thank you. lm.fit would appear to be perfectly consistent in its behaviour so I will use that.Bethanie
An educational answer, thank-you. Can I ask where Intercept = 1 came from in your call to lm.fit()? (I see lm() calls model.matrix(), and model.matrix() appears to always put 1 in that first column. But is there any time I'd ever want to put something other than 1 there? (If that deserves its own stackoverflow question, let me know.)Wilburn
lm includes the intercept by default but lm.fit is a lower level routine which does not. If you wanted to change the units of the intercept you could use something other than 1 in lm.fit.Coprolalia
C
3

New answer

G. Grothendieck's answer is correct but you can do it faster with the rollRegres package as the following example shows (the roll_regres.fit function is ~118 times faster)

# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)

# assign other function
dolm <- function(x)
  coef(lm.fit(x[, -1], x[, 1]))

# show that they yield the same
library(zoo)
library(rollRegres)
all.equal(
  rollapply(Z, wdth, FUN = dolm,
            by.column = FALSE,  align = "right", fill = NA_real_),
  roll_regres.fit(X, y, wdth)$coefs,
  check.attributes = FALSE)
#R [1] TRUE

# benchmark
library(compiler)
dolm <- cmpfun(dolm)

microbenchmark::microbenchmark(
  newnew = roll_regres.fit(X, y, wdth),
  prev   = rollapply(Z, wdth, FUN = dolm,
                     by.column = FALSE,  align = "right", fill = NA_real_),
  times = 10)
#R Unit: microseconds
#R expr        min         lq       mean     median         uq        max neval
#R newnew    884.938    950.914   1026.134   1025.581   1057.581   1242.075    10
#R   prev 111057.822 111903.649 118867.761 116857.726 122087.160 141362.229    10

You can also use the roll_regres function from the package if you want to use a R formula instead.

Old answer

A third options would be to update the R matrix in a QR decomposition as done in the code below. You can speed this up by doing it in C++ but than you will need the dchud and dchdd subroutines from LINPACK (or another function to update R)

library(SamplerCompare) # for LINPACK `chdd` and `chud`
roll_coef <- function(X, y, width){
  n <- nrow(X)
  p <- ncol(X)
  out <- matrix(NA_real_, n, p)

  is_first <- TRUE
  i <- width 
  while(i <= n){
    if(is_first){
      is_first <- FALSE
      qr. <- qr(X[1:width, ])
      R <- qr.R(qr.)

      # Use X^T for the rest
      X <- t(X)

      XtY <- drop(tcrossprod(y[1:width], X[, 1:width]))
    } else {
      x_new <- X[, i]
      x_old <- X[, i - width]

      # update R 
      R <- .Fortran(
        "dchud", R, p, p, x_new, 0., 0L, 0L, 
        0., 0., numeric(p), numeric(p), 
        PACKAGE = "SamplerCompare")[[1]]

      # downdate R
      R <- .Fortran(
        "dchdd", R, p, p, x_old, 0., 0L, 0L, 
        0., 0., numeric(p), numeric(p), integer(1),
        PACKAGE = "SamplerCompare")[[1]]

      # update XtY
      XtY <- XtY + y[i] * x_new - y[i - width] * x_old
    }

    coef.    <- .Internal(backsolve(R, XtY, p, TRUE, TRUE))
    out[i, ] <- .Internal(backsolve(R, coef., p, TRUE, FALSE))

    i <- i + 1
  }

  out
}

# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)

# assign other function
dolm <- function(x) 
  coef(lm.fit(x[, -1], x[, 1]))

# show that they yield the same
library(zoo)
all.equal(
  rollapply(Z, wdth, FUN = dolm,  
            by.column = FALSE,  align = "right", fill = NA_real_),
  roll_coef(X, y, wdth), 
  check.attributes = FALSE)
#R> [1] TRUE

# benchmark
library(compiler)
roll_coef <- cmpfun(roll_coef)
dolm <- cmpfun(dolm)
microbenchmark::microbenchmark(
  new =  roll_coef(X, y, wdth),
  prev = rollapply(Z, wdth, FUN = dolm,  
                   by.column = FALSE,  align = "right", fill = NA_real_), 
  times = 10)
#R> Unit: milliseconds
#R>  expr        min         lq       mean     median         uq       max neval cld
#R>   new   8.631319   9.010579   9.808525   9.659665   9.973741  11.87083    10  a 
#R>  prev 118.257128 121.734860 124.489826 122.882318 127.195410 135.21280    10   b

The solution above requires that you form the model.matrix and model.response first but this is just three calls (one extra to model.frame) prior to the call to roll_coef.

Com answered 18/2, 2018 at 23:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.