diff on data.table column
Asked Answered
I

2

5

I just want to take difference between rows in a data.table such that dif[n] = value[n] - value[n-1].

I currently try to achive that using zoo::rollapplyr:

library(data.table)

dt <- fread(text = "#
variable    value
xyz 3
xyz 7
xyz 5
abc 9
abc 10
abc 2
")

dt[, dif := zoo::rollapplyr(value, 2, function(x){r <- diff(x,lag = 1)}, align = "right"), by = list(variable)]

#> dt
#>    variable value dif
#> 1:      xyz     3   4
#> 2:      xyz     7  -2
#> 3:      xyz     5   4
#> 4:      abc     9   1
#> 5:      abc    10  -8
#> 6:      abc     2   1

But, compared to what I have, the results should be shifted by 1 position, and the first position for each "variable" should be NA, i.e. dif should be c(NA, 4, -2, NA, 1, -8). The first value for each "variable should be NA because there is no position n-1. Any idea how I can modify the function to accomplish this? Would really like to know how I can do this with rollapplyr for the sake of my own understanding.

EDIT 2024: data.table prevents the unwanted recycling of the generated dif values by failing with this error message:

Error in `[.data.table`(dt, , `:=`(dif, zoo::rollapply(value, 2, function(x) { : 
  Supplied 2 items to be assigned to group 1 of size 3 in column 'dif'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
Influential answered 10/5, 2016 at 14:21 Comment(6)
what would you want your last value to be? I see that value 5 of dif is -8 form (2 - 8), what would you want where the 1 is?Indogermanic
The more general question is here: https://mcmap.net/q/216097/-use-a-value-from-the-previous-row-in-an-r-data-table-calculation (with the same answer as mtoto's)Lyric
dif should be (NA, 4, -2, NA, 1, -8). The first value for each "variable should be NA because there is no position n-1.Influential
Fyi, this is why you should put your desired output into the question itself in the first place, so there's no room for misunderstanding it. It's not too late to edit it in now.Lyric
You can also do dt[,c(NA,diff(value)), by = variable]Zambrano
@skan, this would be a deepcopy of whole valueSinker
L
17

We could use shift():

dt[,diff := value - shift(value), by = variable]
> dt
   variable value diff
1:      xyz     3   NA
2:      xyz     7    4
3:      xyz     5   -2
4:      abc     9   NA
5:      abc    10    1
6:      abc     2   -8
Lutetium answered 10/5, 2016 at 14:32 Comment(2)
this just shifts the entire column down. The first value for abc, and xyz should be NA because there is no n-1 value at these positions. This is the behavior I get when I use all other functions with rollapplyr()Influential
Thanks. Much appreciated. This definitely works and will use it. I would still really like to understand why I can't use rollapplyr() though.Influential
E
2

Try this:

dt[, dif := zoo::rollapplyr(value, 2, function(x){diff(x,lag = 1)}, fill = NA), by = list(variable)]

> dt
   variable value dif
1:      xyz     3  NA
2:      xyz     7   4
3:      xyz     5  -2
4:      abc     9  NA
5:      abc    10   1
6:      abc     2  -8
Ezmeralda answered 16/5, 2016 at 21:56 Comment(1)
Since the call to rollapplyr changed it would be great to update the answer. From the rollapplyr documentation: na.pad deprecated. Use fill = NA instead of na.pad = TRUE.Score

© 2022 - 2024 — McMap. All rights reserved.