All other answers more or less follow a "conditional specification" where starting index and run length of the NA chunks are simulated. However, as non-overlapping condition must be satisfied these chunks have to be determined one by one. Such dependence prohibits vectorization, and either for
loop or lapply / sapply
must be used.
However, this problem is just another run length problem. 12 non-overlapping NA chunks would divide the whole sequence into 13 non-missing chunks (yep, I guess this is what OP wants as missing chunks occurring as the first chunk or the last chunk is not interesting). So why not think of the following:
- generate run length of 12 missing chunks;
- generate run length of 13 non-missing chunks;
- interleave these two type of chunks.
The second step looks difficult as it must satisfy that length of all chunks sums up to a fixed number. Well, multinomial distribution is just for this.
So here is a fully vectorized solution:
# run length of 12 missing chunks, with feasible length between 1 and 144
k <- sample.int(144, 12, TRUE)
# run length of 13 non-missing chunks, summing up to `10000 - sum(k)`
# equal probability is used as an example, you may try something else
m <- c(rmultinom(1, 10000 - sum(k), prob = rep.int(1, 13)))
# interleave `m` and `k`
n <- c(rbind(m[1:12], k), m[13])
# reference value: 1 for non-missing and NA for missing, and interleave them
ref <- c(rep.int(c(1, NA), 12), 1)
# an initial vector
vec <- rep.int(ref, n)
# missing index
miss <- is.na(vec)
We can verify that sum(n)
is 10000. What's next? Feel free to fill in non-missing entries with random integers maybe?
My initial answer may be too short to follow, thus the above expansion is taken.
It is straightforward to write a function implementing the above, with user input, in place of example parameter values 12, 144, 10000.
Note, the only potential problem of multinomial, is that under some bad prob
, it could generate some zeros. Thus, some NA chunks will in fact join together. To get around this, a robust check is as such: replace all 0 to 1, and subtract the inflation of such change from the max(m)
.