I wish to have a fast way to deal with rowwise calculations where values of cells depend on values in previous rows of different columns, prefering vectorization over looping through individual rows (follow-up from here).
Say I have the following dataset dt
and a constant
(loaded libraries are data.table
, dplyr
and purrr
)
dt <- structure(list(var1 = c(-92186.7470607738, -19163.5035325072,
-18178.8396858014, -9844.67882723287, -16494.7802822178, -17088.0576319257
), var2 = c(-3.12, NA, NA, NA, NA, NA), var3 = c(1, NA, NA, NA,
NA, NA)), class = c("data.table", "data.frame"), row.names = c(NA,
-6L))
constant <- 608383
print(dt)
var1 var2 var3
1: -92186.747 -3.12 1
2: -19163.504 NA NA
3: -18178.840 NA NA
4: -9844.679 NA NA
5: -16494.780 NA NA
6: -17088.058 NA NA
The fast, vectorized equivalent of
for(i in 2:nrow(dt)){
prev <- dt[(i-1),]
dt[i, var2 := prev$var2 - var1/constant]
}
would be
dt %>%
mutate(var2 = accumulate(var1[-1], .init = var2[1], ~ .x - .y /constant))
But what if I want to include more columns in the calculation? In this example var3
, but in the real dataset there are >10 columns. I wish the solution to keep that into account. Example for loop (desired output):
for(i in 2:nrow(dt)){
prev <- dt[(i-1),]
dt[i, var2 := prev$var2 + prev$var3 - var1/constant]
dt[i, var3 := prev$var1 + 0.1 * var2/constant]
}
print(dt)
var1 var2 var3
1: -92186.747 -3.120000e+00 1.00
2: -19163.504 -2.088501e+00 -92186.75
3: -18178.840 -9.218881e+04 -19163.52
4: -9844.679 -1.113523e+05 -18178.86
5: -16494.780 -1.295311e+05 -9844.70
6: -17088.058 -1.393758e+05 -16494.80