Change zero to ones in vector if surrounded by less than five consecutive zeros
Asked Answered
A

5

6

I have a vector of 0s and 1s and want to identify the indices where a string of 0s is surrounded by 1s. If the number of 0s between the 1s is lower or equal than 5, I want to change these zeros to 1s.

Here is an example:

> x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)

In positions 7,8, and 9, I have only three zeros, and thus these need to be changed to 1. The other zeros are more than 5, and thus need not to be changed.

The resulting vector should look like this:

> x_converted <- c(0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1)

I am doing this with a for loop and if else statement, but I am sure there must be a faster way to do this.

Thank you.

Auguste answered 3/2, 2022 at 15:17 Comment(0)
A
3

A possible solution with rle which does not change shorts sequences of zero's at the beginning or end of x:

# create the run length encoding
r <- rle(x)

# create an index of which zero's should be changed
i <- r$values == 0 & r$lengths < 5 & 
  c(tail(r$values, -1) == 1, FALSE) & 
  c(FALSE, head(r$values, -1) == 1)

# set the appropriate values to 1
r$values[i] <- 1

# use the inverse of rle to recreate the vector
inverse.rle(r)

which gives:

[1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
Ascendant answered 3/2, 2022 at 15:36 Comment(2)
oh, I forgot there was an inverse.rle() function!Pearly
@BenBolker It's just a wrapper around rep with some tests added ;-)Ascendant
F
3

You can use rle() to get the runs. Then just change it based on the length of the run, excluding first run by looking at cumprod().

x_rle <- rle(x)

x_0 <- cumprod(x_rle$values == 0)
x_rev_0 <- rev(cumprod(rev(x_rle$values) == 0))

x_rle$values <- ifelse(
  x_rle$lengths > 5 | x_0 | x_rev_0,
  x_rle$values,
  1
)

rep(x_rle$values, x_rle$lengths)
#>  [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
Fayfayal answered 3/2, 2022 at 15:32 Comment(2)
This answer also fails at the tail. For example with this x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0), this ifelse changes the last 0 to 1.Auguste
Edited to provide a solution that works with that edge case.Fayfayal
A
3

A possible solution with rle which does not change shorts sequences of zero's at the beginning or end of x:

# create the run length encoding
r <- rle(x)

# create an index of which zero's should be changed
i <- r$values == 0 & r$lengths < 5 & 
  c(tail(r$values, -1) == 1, FALSE) & 
  c(FALSE, head(r$values, -1) == 1)

# set the appropriate values to 1
r$values[i] <- 1

# use the inverse of rle to recreate the vector
inverse.rle(r)

which gives:

[1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
Ascendant answered 3/2, 2022 at 15:36 Comment(2)
oh, I forgot there was an inverse.rle() function!Pearly
@BenBolker It's just a wrapper around rep with some tests added ;-)Ascendant
P
2

The rle() (run-length-encoding) function makes this pretty easy.

x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)
r <- rle(x)
## modify values appropriately
r$values[r$values==0 & r$lengths<=5] <- 1
## convert back to full vector
x_new <- rep(r$values, r$lengths)
## [1] 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1

However, this still needs a little bit of adjustment for literal edge cases — this has converted the initial run of 3 zeros to 1. Perhaps

n <- length(r$values)
rv_int <- r$values[2:(n-1)]
rl_int <- r$lengths[2:(n-1)]
rv_int[rv_int == 0 &
       rl_int <= 5] <- 1
x_new <- rep(c(r$values[1],  rv_int, r$values[n]),
             c(r$lengths[1], rl_int, r$lengths[n]))
Pearly answered 3/2, 2022 at 15:25 Comment(1)
Thank you for the answer. Is there a way to avoid to change the first three zeros, as these are not surrounded by 1s?Auguste
L
2

With data.table::rleid: rleid creates run-length type group id, which are used as grouping factor in ave. ave then performs a function over the groups defined by r.

r <- data.table::rleid(x)
# [1] 1 1 1 2 2 2 3 3 3 4 4 5 5 5 5 5 5 6 6 6 6
sub <- !r %in% c(1, max(r)) 

x[sub] <- ave(x[sub], r[sub], FUN = function(x) ifelse(length(x) <= 3 & x == 0, 1, x))
# [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
Lattonia answered 3/2, 2022 at 15:32 Comment(0)
A
0

A different approach, based on converting x to a string and then back to a numeric vector:

library(tidyverse)

x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)

x %>% str_c(collapse = "") %>% 
  str_replace_all("(?<=1)0{1,5}(?=1)", \(x) str_dup("1", nchar(x))) %>% 
  str_split("") %>% flatten %>% as.numeric

#>  [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1

Or with purrr::walk and rle:

library(purrr)

x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)

z <- rle(x)

walk(1:(length(z$values)-3), 
  ~ if (all(z$values[.x:(.x+2)] == c(1,0,1)) & z$lengths[.x+1] <= 5)
     z$values[.x+1] <<- 1)

inverse.rle(z)

#>  [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
Amelina answered 3/2, 2022 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.