I want to estimate rolling value-at-risk for a dataset of about 22.5 million observations, thus I want to use sparklyr for fast computation. Here is what I did (using a sample database):
library(PerformanceAnalytics)
library(reshape2)
library(dplyr)
data(managers)
data <- zerofill(managers)
data<-as.data.frame(data)
class(data)
data$date=row.names(data)
lmanagers<-melt(data, id.vars=c('date'))
Now I estimate VaR using dplyr and PerformanceAnalytics packages:
library(zoo) # for rollapply()
var <- lmanagers %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T))
This works fine. Now I do this to make use of sparklyr:
library(sparklyr)
sc <- spark_connect(master = "local")
lmanagers_sp <- copy_to(sc,lmanagers)
src_tbls(sc)
var_sp <- lmanagers_sp %>% group_by(variable) %>% arrange(variable,date) %>%
mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T)) %>%
collect
But this gives the following error:
Error: Unknown input type: pairlist
Can anyone please tell me where is the error and what is the correct code? Or any other solution to estimate rolling VaR faster is also appreciates.
data$date=row.names(data)
gives you a vector ofcharacter
, not ofDate
? What happens if you dodata$date <- as.Date(row.names(data))
– Pummel