R data.table cumulative sum function
Asked Answered
A

1

0

I have created the following reproducible example:

library(data.table)
Col_1 <- 0.05
Col_2 <- c( "B", "A", "C", "B", "C", "A", "C", "B", "B", "A" )
Col_3 <- 1000
Col_4 <- ""
data <- data.frame( Col_1, Col_2, Col_3, Col_4 ) 
mydata.table <- as.data.table( data )[ , list( Col_1, Col_2, Col_3, Col_4 ) ]
Col1 <- "Col_1"; Col2 <- "Col_2"; Col3 <- "Col_3"; Col4 <- "Col_4"
mydata.table[, (Col4) := ifelse( get( Col2 ) == "A" , get( Col1 ) * get( Col3 ), "0" ) ]
mydata.table[ , (Col3) := cumsum( c( 1000, head( Col4, -1 )))]

My problem is that Col3 is not calculating the cumsum correctly and remains static at 1000. I have adapted my code from other answers on this site but need a little help please. I would like Col3 to start at 1000 then cumulatively add Col4 (lagging by one row above).

I would like the output to show the following:

Col_1 <- 0.05
Col_2 <- c( "B", "A", "C", "B", "C", "A", "C", "B", "B", "A")
Col_3 <- c( 1000.0, 1000.0, 1050.0, 1050.0, 1050.0, 1050.0, 1102.5, 1102.5, 1102.5, 1102.5 ) 
Col_4 <- c( 0, 50.0, 0, 0, 0, 52.5, 0, 0, 0, 55.1 )
good_data <- data.frame( Col_1, Col_2, Col_3, Col_4 )
gooddata.table <- as.data.table( good_data )[ , list( Col_1, Col_2, Col_3, Col_4 )]

Would this need to be calculated in a loop as each column relies on the result of another? Thank you.

UPDATE to example based on comments below and including new code thanks to @Frank 's answer:

library(data.table)
Col_1 <- 0.05
Col_2 <- c( "B", "A", "C", "B", "C", "A", "C", "B", "B", "A" )
Col_3 <- 1000
Col_4 <- 0
mydata.table <- data.table(Col_1, Col_2, Col_3, Col_4)
Col1 <- "Col_1"; Col2 <- "Col_2"; Col3 <- "Col_3"; Col4 <- "Col_4"
mydata.table[, (Col3) := Col_3*cumprod(1+Col_1*shift(Col_2=="A", type="lag", fill=FALSE))]
mydata.table[, (Col4) := ifelse( get( Col2 ) == "A" , get( Col1 ) * get( Col3 ), "0" ) ]
Anaphylaxis answered 17/11, 2015 at 15:7 Comment(6)
@Arun I voted to close because it is not very clear to me what OP wants (didn't downvote).Hunk
Hi all, thanks for your comments, I am relatively new to R so I will take it all on board. To clarify this is a simplified example of a large data set, the reason I am using get() is so I can use ifelse statements such as ifelse( get( Col2 ) > get( Col4 ) & get( Col6 ) < get( Col8 ), . If there's a better way please let me know.Anaphylaxis
@Anaphylaxis Fyi, to get your desired output try skipping creation of the intermediate object Col_4 and just doing mydata.table[, Col_3*cumprod(1+Col_1*shift(Col_2=="A", type="lag", fill=FALSE))] (I'm ignoring your issues with get covered by Arun; as well as your overwriting of Col_3.) To understand how this works, try ?cumprod and ?shift.Esdraelon
Thanks @Esdraelon , I'm getting this error message though: 'Error in shift(Col_2 == "A", type = "lag", fill = FALSE) : unused arguments (type = "lag", fill = FALSE)'Anaphylaxis
@Anaphylaxis Maybe you aren't using the latest version of data.table on CRAN (1.9.6), where shift was added? Otherwise, it could be that there is a conflict with another package using the same function name, so you could try data.table::shift in place of shiftEsdraelon
You were right @Esdraelon I've updated the package version and the code you provided works perfectly. Thanks very muchAnaphylaxis
E
6

To get your desired output, try skipping creation of the intermediate object Col_4 and just doing

mydata.table[, Col_3*cumprod(1 + Col_1*shift(Col_2 == "A", type = "lag", fill=FALSE))] 

To understand how this works, try ?cumprod and ?shift. You can also run it in pieces, e.g.,

mydata.table[, shift(Col_2 == "A", type = "lag", fill = FALSE)] 

(I'm ignoring your issues with get mentioned in comments; as well as your overwriting of Col_3.)

Esdraelon answered 17/11, 2015 at 18:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.