Generating a pseudorandom binary sequence where the same number does not occur more than twice in a row
Asked Answered
F

6

7

I want to be able to generate a (pseudo)random binary sequence (for example, a sequence of Rs and Ls like RLLRRLR) which is counterbalanced such that the same item does not occur more than twice in a row in the sequence and if I, for example, have a sequence of 20, I get 10 of each. Is there a function in R that can do something like this?

I tried to write a function like this myself. Here is the attempt:

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2, initial_seq = NULL) {
  while(n_L > 0 | n_R > 0){
    side <- sample(c("R", "L"), 1)
    
    if (side == "R" & n_R > 0 & length(grep("R", tail(initial_seq, max_consec))) != max_consec) {
      initial_seq <- append(initial_seq, side)
      n_R <- n_R - 1
    } else if (side == "L" & n_L > 0 & length(grep("L", tail(initial_seq, max_consec))) != max_consec) {
      initial_seq <- append(initial_seq, side)
      n_L <- n_L - 1
    }
  }
  print(initial_seq)
}

# The function does not stop with the following seed
set.seed(1)
RL_seq()

However, it's up to chance whether the code gets stuck or not. I was also hoping that I could change the rules for the sequence (for example, allowing for 3 consecutive Rs), but the code tends to breaks if I touch the arguments. At this point I would be happy if I could run it with the default arguments and not have it get stuck.

I have searched around but I cannot find an answer.

Fish answered 13/3, 2024 at 17:27 Comment(5)
The two answers we have submitted so far presume n_L and n_R are not much bigger than 20 or 30, since draws that match your max_consec criteria will become vanishingly unlikely for longer sequences. If you want to make larger sequences, this approach will stop making sense.Briefs
Strong agree with Jon here - with a longer sequence you could take a constructive approach: start by randomly picking LRLRLR... or RLRLRL and then randomly pick equal numbers of Ls and Rs to double.Dharna
I would think about a Markov chain generator on doublets/triplets (so that RR, LL are never followed by the third repeat), but I don't know how to enforce the balance.Hylophagous
Related paper and sequence.Vincenzovincible
A comment about your attempt: the line initial_seq <- append(initial_seq, side) is an anti-pattern called "growing an object in a loop". Every iteration through your while loop, initial_seq gets longer, and this is an expensive operation because it may run out of space wherever it is stored on your disk and need to be copied somewhere else with enough room. It's much more efficient to "pre-allocate" a vector by creating it already at it's final length and then filling in values as you go. Read more in The R InfernoDharna
B
3

Here, I first determine the mix of elements, then repeatedly shuffle until there's a draw without too many consecutive picks. I'm sure there's a more computationally efficient approach (e.g. where we don't overwrite seq every iteration), but this might suffice. For example, it takes ~0.002 sec for max_consec = 2, or a few seconds to find a draw with no repeats (max_consec = 1), even if that takes 100k+ draws to find.

Any approach (like the two suggested so far) which relies on random draws will perform poorly for larger sequences, since it will become vanishingly unlikely to happen upon a sequence with few enough repeated strings by chance.

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2) {
  # make a sequence that is all the Ls then all the Rs
  seq = c(rep("L", n_L), rep("R", n_R))
  consec = max_consec + 1 # we know the first attempt won't work
  while(consec > max_consec) {
    # overwrite the sequence with a shuffle of it
    seq = sample(seq, length(seq), replace = FALSE)
    # what is the maximum "run length encoding" (rle) length of the sequence?
    consec = max(rle(seq)[1]$lengths)
  }
  seq
}


RL_seq()
Briefs answered 13/3, 2024 at 18:7 Comment(2)
Well, for the special case of max_consec = 1 you'd be more computationally efficient with if(runif(1) < .5) rep(c("L", "R"), n_L) else rep(c("R", "L"), n_L) ;)Dharna
Excellent point, might be worth adding if that case is needed to be more efficient. I'm curious about what approaches would make sense as n_L/n_R get bigger like 40 or 50 or even 1000, such that just drawing at random is unlikely to find a working sequence in the first, say, million or billion draws.Briefs
M
3

OP/Simon noted that my approach was fast but overly restrictive; with forcing checks adversly limiting the range of possibilities at the beginnings of the strings; One of Gregors comments referenced the idea of doing half and then doubling; so my final approach leverages that sort of idea; the forced rebalances will only happen in the latter half of the string under constructions. I had a warning through in case it should fail; I leave the older solution beneath the newer one :

library(tidyverse)

RL_v2 <- function(desired_length, max_consec = 2) {
  stopifnot(desired_length %% 2==0)
  half_length <- desired_length / 2
  possible_tokens <- c("R", "L")
  # later will use the following tokens when building an explain list
  #  explain_tokens <- c("1","2","O","F")
  # 1 - we had to add 1/R to balance 
  # 2 - we had to add 2/L to balance
  # we had to choose opposite to previous to not make too long consecutive
  # we had a free choice 
  rlseq <- character(0)
  explain_seq <- character(0)
  while (length(rlseq) < desired_length) {
    
    current_tail <- tail(rlseq, max_consec)
    suppressWarnings(max_tail <- max(rle(current_tail)[1L]$lengths))
    if (max_tail < max_consec) {
      # we are free to choose either R or L ; 
      # must we choose one or the other to maintain 'parity'? 
      sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                    sum(rlseq==possible_tokens[[2]]))
      if(sumdiff<0 & length(rlseq)>=half_length){
        # more 2 than 1 so must add 1 
        rlseq <- c(rlseq, possible_tokens[[1]])
        explain_seq <- c(explain_seq,"1")
      } else if(
        sumdiff>0  & length(rlseq)>=half_length
      ){
        # more 1 than 2 so must add 2
        rlseq <- c(rlseq, possible_tokens[[2]])
        explain_seq <- c(explain_seq,"2")
      } else {
        
        # a blessed free choice
        rlseq <- c(rlseq, sample(possible_tokens, size = 1L))
        explain_seq <- c(explain_seq,"F")
      }
      
    } else {
      # forced opposite
      rlseq <- c(rlseq, setdiff(possible_tokens,tail(rlseq,1L)))
      explain_seq <- c(explain_seq,"O")
    }
  }
  sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                sum(rlseq==possible_tokens[[2]]))
  if(sumdiff!=0){
    warning(paste0("Wasnt able to balance\t:",sumdiff))
    }
  list(seq=rlseq,
       explain=explain_seq)
}

#is RRLRR inside 20 length sequences ? 
sapply(1:200,\(x){
  (res <- RL_v2(20,max_consec = 2))
  str_detect(paste0(res$seq,collapse=""),
             pattern = fixed("RRLRR"))
}) |> table()
# FALSE  TRUE 
# 104    96 

# about half the time 

#are we beginning with RRLL? 
sapply(1:200,\(x){
  (res <- RL_v2(20,max_consec = 2))
  str_detect(paste0(res$seq,collapse=""),
             pattern = "^RRLL")
}) |> table()
# about 10%

EDITED : My method, works constructively by checking the tail of what was chosen, and if we have freedom we use it, and if we dont we take the forced choice, it should run in linear time , making it scaleable. extended to check if the string is unbalanced, and rebalance as it goes. the return is a list of the result, and an explainer that shows when a free choice vs a forced choice was made under the rules .

RLx <- function(desired_length, max_consec = 2) {
  possible_tokens <- c("R", "L")
# later will use the following tokens when building an explain list
#  explain_tokens <- c("1","2","O","F")
                                      # 1 - we had to add 1/R to balance 
                                      # 2 - we had to add 2/L to balance
                                      # we had to choose opposite to previous to not make too long consecutive
                                      # we had a free choice 
  rlseq <- character(0)
  explain_seq <- character(0)
  while (length(rlseq) < desired_length) {
 
    current_tail <- tail(rlseq, max_consec)
    suppressWarnings(max_tail <- max(rle(current_tail)[1L]$lengths))
    if (max_tail < max_consec) {
      # we are free to choose either R or L ; 
      # must we choose one or the other to maintain 'parity'? 
      sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                   sum(rlseq==possible_tokens[[2]]))
      if(sumdiff<0){
        # more 2 than 1 so must add 1 
        rlseq <- c(rlseq, possible_tokens[[1]])
        explain_seq <- c(explain_seq,"1")
      } else if(
        sumdiff>0
      ){
        # more 1 than 2 so must add 2
        rlseq <- c(rlseq, possible_tokens[[2]])
        explain_seq <- c(explain_seq,"2")
      } else {
        # a blessed free choice
      rlseq <- c(rlseq, sample(possible_tokens, size = 1L))
      explain_seq <- c(explain_seq,"F")
      }
    } else {
      # forced opposite
      rlseq <- c(rlseq, setdiff(possible_tokens,tail(rlseq,1L)))
      explain_seq <- c(explain_seq,"O")
    }
}
  list(seq=rlseq,
       explain=explain_seq)
}

(res <- RLx(300,max_consec = 2))
Microchemistry answered 13/3, 2024 at 19:23 Comment(4)
This seems to check only the consecutive constraint, but I think it ignores the constraint that there is an equal number of Ls and Rs in the final sequence?Dharna
Thanks for this comment; I will edit to incorporate this constraintMicrochemistry
I like this approach but it seems to me that it rebalances too aggressively. For instance, a sequence can never begin or end with RR or LL. You can also not get sequences like RRLRR in a 20 length sequence even though there is no constraint on this. I tried to change this with (length(rlseq) > (desired_length*0.9) & sumdiff < 0) | sumdiff < -1 instead of sumdiff<0and vice versa for sumdiff>0, which is to say that it only applies the strict constraint at the end. This allows for RR and LL beginnings and endings, as well as more free choices, but it does not fix the RRLRR issue.Fish
Hope this edit helpsMicrochemistry
C
2

I'd use an MCMC approach to this. Define a distribution on permutations which has the highest value for sequences that meet your condition, and lower values for each violation of the condition. Start with a random permutation of the symbols, then run Metropolis-Hastings or similar where the moves are swaps of randomly chosen pairs. It will eventually spend most of its time in sequences that you find acceptable.

The samples drawn won't be independent, but averages of functions of them will converge to the population mean.

Or if you don't care too much about the distribution, just run it from a random permutation until you get one value, then start over again. Those sequences will be independent, but might not be uniformly distributed over the acceptable permutations.

Here's some code to use this to generate 5 sequences:

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2, n = 1,
                   alpha = 0.1 ) {
  
  # alpha is a tuning parameter.  Smaller values make the chain
  # move faster, but it will also move to sequences you don't
  # want, so you might get slower output.
  
  score <- function(seq) {
    # Compute a penalty score for violating the rule
    # An acceptable solution has score 0
    lens <- rle(seq)$lengths
    lens <- lens - max_consec
    sum(lens[lens > 0])
  }
  X <- sample(c(rep("L", n_L), rep("R", n_R)))
  Xscore <- score(X)
  
  result <- matrix(NA, n, length(X))
  count <- 0
  while (count < n) {
    Y <- X
    # Randomly select a pair of entries that are different.
    repeat {
      i <- sample(length(X), 2)
      if (Y[i[1]] != Y[i[2]])
        break
    }
    # Swap them
    temp <- Y[i[1]]
    Y[i[1]] <- Y[i[2]]
    Y[i[2]] <- temp
    Yscore <- score(Y)

    # Always accept the proposal if it is at least as good, 
    # and sometimes if it is worse, using the Metropolis ratio
    
    if (Xscore >= Yscore || runif(1) < exp(alpha * (Xscore - Yscore))) {
      X <- Y
      Xscore <- Yscore
    } 
    if (Xscore == 0) {
      count <- count + 1
      result[count,] <- X
    }
  }
  result
}

set.seed(1)
RL_seq(10, 10, n = 5)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] "R"  "L"  "R"  "R"  "L"  "R"  "L"  "L"  "R"  "R"   "L"   "L"   "R"   "L"  
#> [2,] "R"  "R"  "L"  "R"  "L"  "R"  "L"  "R"  "L"  "R"   "L"   "L"   "R"   "R"  
#> [3,] "R"  "R"  "L"  "L"  "R"  "L"  "R"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#> [4,] "R"  "R"  "L"  "L"  "R"  "L"  "R"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#> [5,] "R"  "R"  "L"  "L"  "R"  "R"  "L"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] "R"   "R"   "L"   "L"   "R"   "L"  
#> [2,] "L"   "L"   "R"   "R"   "L"   "L"  
#> [3,] "L"   "R"   "L"   "L"   "R"   "L"  
#> [4,] "L"   "L"   "R"   "L"   "R"   "L"  
#> [5,] "L"   "L"   "R"   "L"   "R"   "L"

Created on 2024-03-13 with reprex v2.1.0

In this case, all 5 sequences are different, but they are quite similar.

Carranza answered 14/3, 2024 at 0:30 Comment(0)
D
1

In this approach, we generate n_try candidate sequences with equal numbers of 1s and 0s (you can change these to Rs and Ls at the end, if you like). We then filter out any sequences that don't meet the maximum consecutive value condition.

With this seed, generating 100 candidate sequences of length 10 gives 13 eligible results. (Consider each column in the result a sequence.)

set.seed(47)
n_try = 1e2
out_length = 10
n_l = out_length / 2
n_r = out_length / 2
max_in_a_row = 2
letter_space = c(rep(1, n_l), rep(0, n_r))
r = replicate(n_try, sample(letter_space, size = out_length, replace = FALSE))

r = r[, apply(r, MARGIN = 2, FUN = \(x) sum(diff(x) == 0) <= max_in_a_row)]
r
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#  [1,]    1    0    0    0    0    1    0    0    0     1     0     0     0
#  [2,]    0    1    1    1    1    1    1    1    1     0     1     1     1
#  [3,]    0    0    1    0    0    0    0    0    0     1     0     1     0
#  [4,]    1    0    0    0    1    1    1    1    1     0     0     0     0
#  [5,]    0    1    1    1    0    0    0    0    1     0     1     1     1
#  [6,]    1    1    0    1    1    0    1    1    0     1     0     0     1
#  [7,]    0    0    0    0    0    1    1    0    1     0     1     1     0
#  [8,]    1    1    1    1    1    0    0    1    0     1     0     0     1
#  [9,]    0    0    0    0    1    1    1    1    1     1     1     0     0
# [10,]    1    1    1    1    0    0    0    0    0     0     1     1     1

For a 20-length result I bumped n_try up to 1e5, which resulted in 108 qualifying sequences in under a second.

Dharna answered 13/3, 2024 at 18:11 Comment(0)
E
1

Idea

Actually you can artificially construct a "pseudo" process using rmultinom, say, x_1 + x_2 + ... + x_k = n for both L and R categories, but with a constraint on x_i, e.g., x_i <= l, where k defines the minimum distance for two consecutive Ls separated by Rs, for example.

Once a set of {x_i} survives from the constrained sampling, you can following the following two steps:

  1. make {x_i} and {y_i} := sample({x_i}) for the distributions of consecutive Ls and Rs, respectively
  2. Interleave L and R along with there respective repetitions to construct the sequences, e.g., [rep('L',x_1), rep('R',y_1), rep('L',x_2), rep('R',y_2),..., rep('L',x_k), rep('R',y_k)]

Code

f <- function(Ntot, l = 2) {
   n <- Ntot / 2
   k <- ceiling(n / l) + 1
   repeat {
      d <- c(rmultinom(1, n, rep(1, k)))
      if (max(d) <= l && min(d) > 0) break
   }
   cnts <- c(rbind(d, sample(d)))
   rep(rep(sample(c("L", "R")), length.out = length(cnts)), cnts)
}

Example Output

For reproducibility, you can set set.seed(0) before running the following tests

> (out <- f(60, 3))
 [1] "L" "L" "L" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L"
[20] "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "R"
[39] "R" "R" "L" "L" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "R" "R" "L" "L" "L"
[58] "R" "R" "R"

> table(out)
out
 L  R
30 30

> max(rle(out)$lengths)
[1] 3

and

> (out2 <- f(82, 5))
 [1] "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "L" "L" "L" "L" "R" "R" "R"
[20] "L" "L" "L" "L" "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L"
[39] "L" "R" "R" "R" "R" "R" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L" "L" "L" "R"
[58] "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L" "L" "L" "L" "R"
[77] "R" "R" "R" "L" "L" "L"

> table(out2)
out2
 L  R
41 41

> max(rle(out2)$lengths)
[1] 5
Egarton answered 14/3, 2024 at 0:26 Comment(0)
V
1

Below is a function, rbinvec2 that efficiently samples uniformly with replacement from the set of valid vectors of length m. It is able to generate 100K valid samples of a vector of length 101 in less than 1 second.

When I have time, I plan on updating the answer to include a derivation of the underlying PMF.

The function

library(RcppAlgos) # for `permuteGeneral`
library(Rfast) # for `colShuffle`

rbinvec2 <- function(n, m) {
  # handle small `m` values with `RcppAlgos`
  if (m == 1) {
    return(sample(0:1, n, 1))
  } else if (m == 2) {
    return(matrix(c(0:1, 1:0), 2, 2)[sample(1:2, n, 1),])
  } else if (m == 3) {
    return(permuteGeneral(0:1, 3, TRUE, NULL, 2, 7)[sample(6, n, 1),])
  }
  
  odd <- m%%2 == 1
  kk <- (m - sqrt(m^2%%8))/2 # maximum number of doubles
  k <- 0:kk
  mk <- m - k
  mk1 <- (mk + 1)%/%2
  mk <- mk%/%2
  m1 <- m - 1
  m2 <- m%/%2
  # calculate the exact (log) PMF of `k`, including the special cases when `m`
  # is odd
  p <- cbind(lchoose(mk1, k%/%2) + lchoose(mk, (k + 1)%/%2), -Inf)
  i <- seq(1, kk + 1, 2)
  if (odd) { # special cases
    p[-i, 1] <- p[-i, 1] + log(2)
    p[i, 2] <- lchoose(mk1[i], -1:(kk%/%2 - 1)) + lchoose(mk[i], 1:(kk%/%2 + 1))
  }
  # sample the PMF
  k <- array(c(rmultinom(1, n, exp(p - max(p[1,])))), dim(p))
  # k[-i,] <- rowSums(k[-i,])
  s <- rowSums(k)
  b <- vector("list", kk + 1) # initialize the list of compositions
  # special case: no doubles
  if (s[1]) b[[1]] <- rep(1, m*s[1])
  s[1] <- 0L
  
  if (odd) { # odd `m`
    for (i in which(s != 0L)) {
      b[[i]] <- if (i%%2L) {
        # even number of doubles
        j1 <- (m - i)/2 # 1 partition length
        j0 <- j1 + 1 # 0 partition length
        i2 <- (i - 1)/2
        d0 <- j0 - i2
        d1 <- j1 - i2
        c(
          if (k[i, 1]) { # balanced doubles--1 more 0 than 1
            rbind(
              # 0 composition
              c(colShuffle(matrix(rep.int(1:2, c(d0, i2)), j0, k[i, 1]))),
              # 1 composition
              c(rbind(
                colShuffle(matrix(rep.int(1:2, c(d1, i2)), j1, k[i, 1])),
                0L
              ))
            )
          } else integer(0)
          ,
          if (k[i, 2]) { # unbalanced doubles--1 more 1 than 0
            rbind( 
              # 0 composition
              c(colShuffle(matrix(rep.int(1:2, c(d0 + 1, i2 - 1)), j0, k[i, 2]))),
              # 1 composition
              c(rbind(
                colShuffle(matrix(rep.int(1:2, c(d1 - 1, i2 + 1)), j1, k[i, 2])),
                0L
              ))
            )
          } else integer(0)
        )
      } else {
        # odd number of doubles
        i2 <- i/2 - 1 # min number of (0,0)
        j <- m2 - i2 # even/odd composition lengths
        idx <- sample(1:2, s[i], 1) # extra 0, or extra 1?
        bb <- cbind(rep.int(1:2, c(j - i2, i2)), # extra 0
                    rep.int(1:2, c(j - i2 - 1, i2 + 1))) # extra 1
        c(
          rbind(
            c(colShuffle(matrix(bb[,idx], j))), # 0 compositions
            c(colShuffle(matrix(bb[,3 - idx], j))) # 1 compositions
          )
        )
      }
    }
  } else { # even `m`
    for (i in which(s != 0L)) {
      i2 <- (i - 1)%/%2 # number of (0,0)
      j <- m2 - i2 # 0 composition length
      b[[i]] <- c(
        if (i%%2L) {
          # even number of doubles--equivalent partition for 0 and 1
          matrix(
            colShuffle(matrix(rep.int(1:2, c(j - i2, i2)), j, 2*s[i])),
            2, j*s[i], 1
          )
        } else { # odd number of doubles--1 more double 1 than double 0
          rbind(
            # 0-composition
            c(colShuffle(matrix(rep.int(1:2, c(j - i2, i2)), j, s[i]))),
            # 1 composition
            c(rbind(
              colShuffle(
                matrix(rep.int(1:2, c(j - i2 - 2, i2 + 1)), j - 1, s[i])
              ), 0L
            ))
          )
        }
      )
    }
  }
  
  abs(
    matrix(
      # convert the compositions to 0's and 1's
      rep.int(rep(0:1, length.out = sum(lengths(b))), unlist(b)), n, m, 1
      # shuffle the n samples and perform a full 0 <-> 1 swap in random vectors
    )[sample(n),] - sample(0:1, n, 1)
  )
}

Example usage

rbinvec2(10, 8) # 10 samples of vector of length 8
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#>  [1,]    0    1    0    1    0    1    0    1
#>  [2,]    0    1    0    1    0    0    1    1
#>  [3,]    0    1    0    0    1    0    1    1
#>  [4,]    0    0    1    1    0    1    1    0
#>  [5,]    0    0    1    0    1    0    1    1
#>  [6,]    1    0    1    0    1    1    0    0
#>  [7,]    1    0    1    0    1    0    0    1
#>  [8,]    1    0    1    0    1    0    1    0
#>  [9,]    1    0    1    0    1    0    1    0
#> [10,]    1    0    0    1    0    0    1    1
rbinvec2(10, 9)# 10 samples of vector of length 9
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#>  [1,]    0    0    1    1    0    0    1    0    1
#>  [2,]    0    1    1    0    1    1    0    0    1
#>  [3,]    1    0    0    1    0    1    1    0    0
#>  [4,]    1    0    0    1    0    1    0    1    0
#>  [5,]    1    1    0    0    1    1    0    0    1
#>  [6,]    1    1    0    1    0    0    1    0    0
#>  [7,]    0    1    0    1    0    0    1    0    1
#>  [8,]    1    0    1    0    0    1    0    0    1
#>  [9,]    1    1    0    0    1    0    1    1    0
#> [10,]    0    0    1    0    0    1    1    0    1

Timing

system.time(rbinvec2(1e6, 10)) # 1M samples of vector of length 10
#>    user  system elapsed 
#>    1.52    0.12    1.75
system.time(rbinvec2(1e5, 101)) # 100K samples of vector of length 101
#>    user  system elapsed 
#>    0.69    0.09    0.82

Verify correct behavior

library(data.table) # for checks

fcheck1 <- function(m) {
  # returns the number of valid vectors of length `m`
  
  # generate all balanced permutations
  x <- if (m%%2) {
    rbind(
      permuteGeneral(0:1, m, TRUE, c((m + 1)%/%2, m%/%2)),
      permuteGeneral(0:1, m, TRUE, c(m%/%2, (m + 1)%/%2))
    )
  } else {
    permuteGeneral(0:1, m, TRUE, c(m%/%2, m%/%2))
  }
  # count the permutations with no more than two sequential repeated values
  sum(!rowSums(x[,-1:-2] == x[,c(-1, -m)] & x[,c(-1, -m)] == x[,-m:(1 - m)]))
}

fcheck2 <- function(n, m) {
  # return the unique samples along with their counts
  setorder(as.data.table(rbinvec2(n, m))[,.(.N), eval(paste0("V", 1:m))])[]
}

fcheck1(6) # the number of valid vectors of length 6
#> [1] 14
fcheck2(1e6, 6)
#>        V1    V2    V3    V4    V5    V6     N
#>     <int> <int> <int> <int> <int> <int> <int>
#>  1:     0     0     1     0     1     1 71195
#>  2:     0     0     1     1     0     1 71249
#>  3:     0     1     0     0     1     1 71237
#>  4:     0     1     0     1     0     1 71588
#>  5:     0     1     0     1     1     0 71469
#>  6:     0     1     1     0     0     1 71482
#>  7:     0     1     1     0     1     0 71242
#>  8:     1     0     0     1     0     1 71368
#>  9:     1     0     0     1     1     0 71638
#> 10:     1     0     1     0     0     1 71606
#> 11:     1     0     1     0     1     0 71645
#> 12:     1     0     1     1     0     0 71867
#> 13:     1     1     0     0     1     0 71107
#> 14:     1     1     0     1     0     0 71307
fcheck1(7)
#> [1] 36 # the number of valid vectors of length 7
fcheck2(1e6, 7)
#>        V1    V2    V3    V4    V5    V6    V7     N
#>     <int> <int> <int> <int> <int> <int> <int> <int>
#>  1:     0     0     1     0     0     1     1 27612
#>  2:     0     0     1     0     1     0     1 27728
#>  3:     0     0     1     0     1     1     0 27451
#>  4:     0     0     1     1     0     0     1 27707
#>  5:     0     0     1     1     0     1     0 27689
#>  6:     0     0     1     1     0     1     1 27460
#>  7:     0     1     0     0     1     0     1 27891
#>  8:     0     1     0     0     1     1     0 27562
#>  9:     0     1     0     1     0     0     1 27805
#> 10:     0     1     0     1     0     1     0 27947
#> 11:     0     1     0     1     0     1     1 27761
#> 12:     0     1     0     1     1     0     0 27893
#> 13:     0     1     0     1     1     0     1 28044
#> 14:     0     1     1     0     0     1     0 27899
#> 15:     0     1     1     0     0     1     1 27756
#> 16:     0     1     1     0     1     0     0 27477
#> 17:     0     1     1     0     1     0     1 27769
#> 18:     0     1     1     0     1     1     0 27756
#> 19:     1     0     0     1     0     0     1 27955
#> 20:     1     0     0     1     0     1     0 27726
#> 21:     1     0     0     1     0     1     1 27931
#> 22:     1     0     0     1     1     0     0 27896
#> 23:     1     0     0     1     1     0     1 27652
#> 24:     1     0     1     0     0     1     0 27720
#> 25:     1     0     1     0     0     1     1 28005
#> 26:     1     0     1     0     1     0     0 28072
#> 27:     1     0     1     0     1     0     1 27802
#> 28:     1     0     1     0     1     1     0 27579
#> 29:     1     0     1     1     0     0     1 27593
#> 30:     1     0     1     1     0     1     0 27846
#> 31:     1     1     0     0     1     0     0 27745
#> 32:     1     1     0     0     1     0     1 27760
#> 33:     1     1     0     0     1     1     0 27840
#> 34:     1     1     0     1     0     0     1 27704
#> 35:     1     1     0     1     0     1     0 27884
#> 36:     1     1     0     1     1     0     0 28083
#>        V1    V2    V3    V4    V5    V6    V7     N

The distribution of each unique vector is consistent with uniform sampling on the full set of valid vectors for both cases.

Vincenzovincible answered 22/3, 2024 at 20:2 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.