From a given row, how to select the previous 'n' rows in R?
Asked Answered
P

3

5

I have a dummy variable like so:

df <- data.frame(year = seq(1990, 1997, 1),
                 x = c(1, 0, 0, 0, 1, 1, 0, 0))

year  x
1990  1
1991  0
1992  0
1993  0
1994  1
1995  1
1996  0
1997  0

I want to create a dummy y equalling 1 if the value of x in any of the three previous years is non-zero. The expected result:

year  x   y
1990  1  NA
1991  0  NA
1992  0   1
1993  0   0
1994  1   1
1995  1   1
1996  0   1
1997  0   1

How do I do this? A dplyr solution is preferred.

Prevailing answered 16/12, 2023 at 18:36 Comment(0)
G
5

If you know for sure you want 3 values, you can do:

library(dplyr)

df %>% mutate(y = sign((x > 0) + (lag(x) > 0) + (lag(x, 2) > 0)))
#>   year x  y
#> 1 1990 1 NA
#> 2 1991 0 NA
#> 3 1992 0  1
#> 4 1993 0  0
#> 5 1994 1  1
#> 6 1995 1  1
#> 7 1996 0  1
#> 8 1997 0  1

But a more general solution if you want to choose n would be:

n <- 3

df %>% mutate(y = sign(purrr::reduce(seq(n) - 1, ~ .x + (lag(x, .y)))))
#>   year x  y
#> 1 1990 1 NA
#> 2 1991 0 NA
#> 3 1992 0  1
#> 4 1993 0  0
#> 5 1994 1  1
#> 6 1995 1  1
#> 7 1996 0  1
#> 8 1997 0  1
Granduncle answered 16/12, 2023 at 19:0 Comment(3)
Your soution with n doesn't seem to work, I get a copy of the x column. Any ideas?Prevailing
@CloftX hmm, works exactly as shown with n = 3 If I run my code in a fresh session with library(dplyr) then input the above, it gave me this exact output. The output is as expected with different values of n too. Have you tried this in a fresh session?Granduncle
I will try in a new session, thanks!Prevailing
T
5

Here is a solution using zoos rollapply:

library(dplyr)
library(zoo)

df %>%
  mutate(y = rollapply(x, width = 3, \(x) any(x > 0), align = "right", fill = NA)*1)

or in more concise many thanks to @G. Grothendieck:

library(dplyr)
library(zoo)
df %>% mutate(y = + rollapplyr(x > 0, 3, any, fill = NA) )
 year x  y
1 1990 1 NA
2 1991 0 NA
3 1992 0  1
4 1993 0  0
5 1994 1  1
6 1995 1  1
7 1996 0  1
8 1997 0  1
Teletype answered 16/12, 2023 at 19:19 Comment(1)
This can be written compactly as df %>% mutate(y = + rollapplyr(x > 0, 3, any, fill = NA) )Vein
A
4

You mentioned dplyr is preferred (Allan Cameron's solution is a perfect dplyr approach), but for posterity and those who may not use dplyr, a base R solution could be to use vapply:

n <- 3
rws <- seq_len(nrow(df))[-(1:(n-1))]

df[rws, "y"] <- vapply(rws, \(x) +(sum(df$x[(x - 2):x]) > 0), 1)

#   year x  y
# 1 1990 1 NA
# 2 1991 0 NA
# 3 1992 0  1
# 4 1993 0  0
# 5 1994 1  1
# 6 1995 1  1
# 7 1996 0  1
# 8 1997 0  1
Arrearage answered 16/12, 2023 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.