the rolling regression in R using roll apply
Asked Answered
L

2

8

My imported data contains 7 variables: Y and X1, X2, X3, X4, X5, X6. I tried applying the rollapply function in zoo in order to run a rolling regression within an in-sample with a window of 262 obs. (work days in a year).

         date             Y            X1            X2
1     10/1/07 -0.0080321720  4.690734e-03  3.333770e-03
2     10/2/07  0.0000000000 -2.818413e-03  5.418223e-03
3     10/3/07  0.0023158650 -4.178744e-03 -3.821100e-04
4     10/4/07 -0.0057491710 -5.071030e-03 -8.321550e-04
5     10/5/07  0.0073570500  3.065045e-03  5.179574e-03
6     10/8/07  0.0127708010 -7.278513e-03  1.145395e-03
7     10/9/07  0.0032661980  9.692267e-03  6.514035e-03
8    10/10/07  0.0013824430  1.161780e-04  2.676416e-03
9    10/11/07  0.0026607550  1.113179e-02  8.825719e-03
10   10/12/07 -0.0046362600 -2.453561e-03 -6.584070e-03
11   10/15/07 -0.0023757680 -7.829081e-03 -3.070540e-03
12   10/16/07 -0.0128673660 -4.619378e-03 -8.972126e-03
13   10/17/07  0.0016049760  1.276695e-03  5.349316e-03
14   10/18/07 -0.0044198970 -9.018499e-03 -1.215895e-02
15   10/19/07 -0.0011080330 -5.328661e-03 -7.131916e-03
16   10/22/07 -0.0024217970 -2.019539e-02 -2.021072e-02
17   10/23/07  0.0031270520  1.668604e-02  2.236130e-02
18   10/24/07 -0.0040367400 -1.061433e-02 -5.735703e-03
19   10/25/07  0.0001011170  1.346312e-02  1.036109e-02
20   10/26/07  0.0003032910  3.766526e-03  2.903628e-03
21   10/29/07  0.0004042450  1.416406e-02  2.527754e-03
22   10/30/07 -0.0012132240 -1.387166e-03 -8.202236e-03
23   10/31/07  0.0057497510  9.593904e-03  1.433401e-02
24    11/1/07 -0.0032238590 -1.648975e-02 -1.029199e-02
25    11/2/07 -0.0031330560 -7.737784e-03 -7.559498e-03
26    11/5/07 -0.0001012300 -7.877763e-03 -8.500554e-03
27    11/6/07 -0.0004050220  7.407770e-03  2.536320e-03
28    11/7/07 -0.0031444970 -5.904219e-03 -8.026064e-03
29    11/8/07 -0.0045822590 -3.712574e-03 -6.395584e-03
30    11/9/07  0.0016316540 -1.432552e-02 -1.741458e-02
31   11/12/07 -0.0019378860 -3.926583e-03 -4.543370e-03
32   11/13/07  0.0011223920 -1.952799e-03 -2.622112e-03
33   11/14/07  0.0008154940  8.687550e-06  1.085682e-03
34   11/15/07  0.0015272620 -1.549745e-02 -1.556172e-02
35   11/16/07 -0.0001017450 -5.578556e-03 -1.432244e-02
36   11/19/07  0.0014234880 -2.206707e-02 -3.537936e-02
37   11/20/07 -0.0010165700  1.643937e-02  5.140822e-03
38   11/21/07 -0.0008140010 -1.715961e-02 -2.756704e-02
39   11/22/07 -0.0008146640 -2.108098e-03  7.455698e-03
40   11/23/07  0.0008146640  1.266776e-02  1.615338e-02
41   11/26/07  0.0008140010  5.539814e-03  2.854080e-03
42   11/27/07  0.0006100660 -8.561106e-03 -9.720505e-03
43   11/28/07 -0.0015258640  3.392103e-02  2.132374e-02
44   11/29/07 -0.0006109980  6.109848e-03  1.045556e-02
45   11/30/07  0.0004073730  9.214342e-03  1.133690e-02
46    12/3/07 -0.0002036660 -7.006415e-03 -6.079820e-04
47    12/4/07  0.0002036660 -1.187605e-02 -2.554853e-02
48    12/5/07  0.0007125040  1.362121e-02  9.525618e-03
49    12/6/07 -0.0034655010  7.917348e-03  5.252105e-03
50    12/7/07  0.0018361730 -1.026832e-02  1.216898e-02
51   12/10/07  0.0013240310  3.347302e-03  1.143687e-02
52   12/11/07  0.0005087760 -3.433720e-03  2.373558e-03
53   12/12/07  0.0024385300  5.507930e-04  3.191504e-03
54   12/13/07 -0.0115336820 -1.793698e-02 -2.149447e-02
55   12/14/07 -0.0010271160 -2.307745e-03 -1.038483e-03
56   12/17/07 -0.0033969870 -1.822079e-02 -2.920662e-02
57   12/18/07  0.0000000000 -1.873297e-03 -7.061215e-03
58   12/19/07 -0.0004125410 -3.372400e-06 -7.879850e-03
59   12/20/07  0.0008249120 -6.227957e-03 -1.752460e-04
60   12/21/07 -0.0020635580  1.734991e-02  1.348190e-02
61   12/24/07  0.0003098050  0.000000e+00  0.000000e+00
62   12/25/07  0.0000000000  0.000000e+00  0.000000e+00
63   12/26/07  0.0001032470  0.000000e+00  0.000000e+00
64   12/27/07  0.0006192590  5.006783e-03  5.274480e-03
65   12/28/07 -0.0005160230  6.428153e-03  8.557260e-03
66   12/31/07  0.0000000000  0.000000e+00  0.000000e+00
67     1/1/08  0.0002064410  0.000000e+00  0.000000e+00
68     1/2/08 -0.0009293200 -6.023384e-03 -3.104400e-03
69     1/3/08  0.0027853730 -2.302511e-03 -2.759650e-03
70     1/4/08  0.0018526150 -2.149450e-02 -2.645257e-02
71     1/7/08 -0.0005142710 -4.445206e-03 -2.117698e-

1596          <NA>         <NA>          <NA>             

the last line for some reason doesn't show the values,even though there are in the original excel file(X3,X4,X5,X6 are missing,since the columns are stacked on top of each other,I copied the batch from the top for the example.

My code is:

rollapply(ts, 262, lm(
          Y~X1+X2+X3+X4+X5+X6+0, subset=1:floor(length(x)/2)), 
          align="right")

The error message I get is:

Error in eval(expr, envir, enclos) : object 'Y' not found

I really wonder why it can not find the Y variable, since it is displayed in the time series dataset with the appropriate heading.

Lauralauraceous answered 18/4, 2014 at 21:49 Comment(2)
The third argument of rollapply is supposed to be a function, but you're calling lm, referring to an undefined value x, and not passing a data= argument.Bogor
The question does not state precisely what output is needed but assuming that you want to store the coefficients for each regression there is an example of this in the examples section of the rollapply help page in the zoo package: rdocumentation.org/packages/zoo/versions/1.8-3/topics/rollapplyUnexacting
W
12

It is not really clear what your data actually is (use dput(example_data) to give reproducible examples).

But the lm call in your example is simply doing the same regression over and over again (your x is not changing) and as josilber points out, it is supposed to be a function. Here is an example where all the data is in the data.frame allRegData and it has at least two columns, one named y and another named x:

require(zoo)
rollapply(zoo(allRegData),
          width=262,
          FUN = function(Z) 
          { 
             t = lm(formula=y~x, data = as.data.frame(Z), na.rm=T); 
             return(t$coef) 
          },
          by.column=FALSE, align="right") 
Workbook answered 18/4, 2014 at 22:5 Comment(2)
thanks a lot for your help,the code worked, but I got a lot of NAs in the middle of the output,I used XLConnect package to export the output to an excel file with: writeWorksheetToFile("results of rolling window in R.xlsx",data=results,sheet="rollapply",header=TRUE) I get the following error message: Error: OutOfMemoryError (Java): Java heap space Could that be solved somehow?Lauralauraceous
The NA could be caused by a not invertible model matrix (i.e. determinant zero) -> do your X1 and X2 show enough variance across observations, are there no duplicates, are there NAs ? Write your output to a csv file with write.table or write.csv to exclude any issue with you XLConnect package. If you have another question: pose it in a new question.Workbook
K
7

I guess your questions are

  1. that you want to apply rolling regression on 262 width window of data for roughly 6 years yielding 1572 which is close to your 1596 observations with six covariates.
  2. figure out how to solve your problem with rollapply.
  3. an issue with loading in a data set from Excel.

It seems hard to help you with 3. since you do not provide the data set or the R code you use. Starting with 1., then you can use the rollRegres package I have made. What you would do is something like this

#####
# simulate data
width  <- 262L
n_yr   <- 6L
n_covs <- 6L
set.seed(23847996)
X <- matrix(rnorm(n_yr * width), ncol = n_covs, dimnames = list(NULL, paste0("X", 1:n_covs)))
df <- data.frame(Y = rnorm(n_yr * width), X)
head(df)
#R         Y     X1     X2     X3      X4     X5      X6
#R 1 -1.1478  0.516 -1.381  0.776 -0.0992  1.254 -0.8444
#R 2 -0.0542  1.328  1.411 -1.206  0.2560  0.975  0.9534
#R 3 -0.8350 -1.402  1.190  0.591 -1.5928  0.330  1.0806
#R 4 -0.5902  0.937 -0.182  0.193  0.1575  0.217 -0.2613
#R 5  1.7891 -0.608 -1.090  0.180 -1.1765 -0.992 -0.8831
#R 6  2.0108  0.259 -0.129  0.261  1.6694 -1.822  0.0616

#####
# estimate coefs
library(rollRegres)
fits <- roll_regres(Y ~ X1 + X2 + X3 + X4 + X5 + X6, df, width = width)
tail(fits$coefs) # estimated coefs
#R      (Intercept)     X1      X2      X3     X4      X5      X6
#R 1567    0.006511 0.0557 -0.0243 0.00907 0.0234 -0.0370 -0.0553
#R 1568    0.005816 0.0569 -0.0233 0.00798 0.0248 -0.0370 -0.0560
#R 1569    0.006406 0.0566 -0.0231 0.00814 0.0243 -0.0385 -0.0555
#R 1570    0.004898 0.0519 -0.0213 0.00773 0.0323 -0.0391 -0.0588
#R 1571    0.002922 0.0532 -0.0211 0.00809 0.0307 -0.0377 -0.0609
#R 1572    0.000771 0.0538 -0.0262 0.00580 0.0309 -0.0363 -0.0658

Now, regarding 2. then you can do something like what Hans Roggeman shows but a version that works with multiple regression as you request

library(zoo)
c2 <- rollapply(
  df, width = width, function(z){
    coef(lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, as.data.frame(z)))
  }, by.column = FALSE, fill = NA_real_, align = "right")
all.equal(fits$coefs, c2, check.attributes = FALSE) # gives the same
#R [1] TRUE

It is much slower though

microbenchmark::microbenchmark(
  rollRegrs = roll_regres(Y ~ X1 + X2 + X3 + X4 + X5 + X6, df, width = width),
  rollapply = rollapply(
    df, width = width, function(z){
      coef(lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6, as.data.frame(z)))
    }, by.column = FALSE, fill = NA_real_, align = "right"), times = 25)
#R Unit: milliseconds
#R       expr     min      lq    mean  median      uq     max neval
#R  rollRegrs    1.73    1.98    2.31    2.37    2.64    3.01    25
#R  rollapply 1726.74 1798.37 1881.25 1834.00 1964.62 2178.58    25

The rollapply version can be faster if you use lm.fit but it it still slower than roll_regres.

Kelp answered 22/9, 2018 at 14:48 Comment(2)
There is any chance you can look at this?Manysided
Hi Benjamin is it possible to calculate the t and p value for every regression that is made with roll_regres? This would help me so much.Unbeatable

© 2022 - 2024 — McMap. All rights reserved.