Simulating a Random Walk

Asked 24/2, 2014 at 14:51 Answered 27/8, 2017 at 0:10

Xn can take values of -1 or 1 each with a probability of 0.5. And Sn= Sn-1 + Xn How can I compute the partial sum observed at time n given by Sn = X1 + X2 + : : : + Xn. I'm trying to simulate a random walk here. I did the following but I'm not exactly sure it's right:

rw <- function(n){
    x=numeric(n)
    xdir=c(TRUE, FALSE)
    step=c(1,-1)
    for (i in 2:n)
    if (sample(xdir,1)) {
        x[i]=x[i-1]+sample(step,1)
    } else {
        x[i]=x[i-1]
    }
    list(x=x)
}

Please Help!

Oak answered 24/2, 2014 at 14:51 Comment(1)

You may also check this article on R-bloggers. – Coppinger 19/1, 2017 at 17:12

This answer is just to explain why your code did not work. @jake-burkhead gave the way you should actually write the code.

In this code, you only make a step half of the time. This is because you are sampling from xdir to decide if you move or not. Instead, I would recommend you the following inside your loop:

for(i in 2:n){
  x[i] <- x[i - 1] + sample(step, 1)
}

The sample(step, 1) call decides if the walk moves 1 or -1.

To compute the partial sums, you can use cumsum() after you generate x. The result will be a vector of the partial sums at a given point in the walk.

Gill answered 24/2, 2014 at 14:59 Comment(3)

Thanks! But where would I include the "i" in my loop? – Oak 24/2, 2014 at 15:18

You're welcome. I was going to change i to step but was not consistent. I fixed the code. – Gill 24/2, 2014 at 15:19

It would be nice to add the definition of step = c(-1,1) to your code snippet. – Copalm 26/10, 2020 at 0:21

You can also do this really concisely and efficiently with cumsum

set.seed(1)

n <- 1000
x <- cumsum(sample(c(-1, 1), n, TRUE))

enter image description here

Hummingbird answered 24/2, 2014 at 15:1 Comment(3)

imo this is the way to do it -- in addition to being much faster due to vectorization (44x faster on a 1000000-length walk on my computer), it's a 1-liner. – Thurber 24/2, 2014 at 16:15

Why would you sample c(-1,1) instead of a normal distribution rnorm(10000,0,1)? I guess that would be a much better equivalent of white noise. – Maurita 13/10, 2019 at 19:8

@Maurita the question asked about a random walk where each step can be -1 or +1 – Hummingbird 14/10, 2019 at 16:59

This answer is just to explain why your code did not work. @jake-burkhead gave the way you should actually write the code.

In this code, you only make a step half of the time. This is because you are sampling from xdir to decide if you move or not. Instead, I would recommend you the following inside your loop:

for(i in 2:n){
  x[i] <- x[i - 1] + sample(step, 1)
}

The sample(step, 1) call decides if the walk moves 1 or -1.

To compute the partial sums, you can use cumsum() after you generate x. The result will be a vector of the partial sums at a given point in the walk.

Gill answered 24/2, 2014 at 14:59 Comment(3)

Thanks! But where would I include the "i" in my loop? – Oak 24/2, 2014 at 15:18

You're welcome. I was going to change i to step but was not consistent. I fixed the code. – Gill 24/2, 2014 at 15:19

It would be nice to add the definition of step = c(-1,1) to your code snippet. – Copalm 26/10, 2020 at 0:21

This post addresses timings of various base R methods for this calculation. This post is inspired by comments to this post and the comment of @josilber in the post to the fastest method posted by Jake Burkhead.

Below, a variety of methods are used to calculate the random walk. To accomplish this, each function pulls 1000 values of either 1 or -1 as defined in fnc below. The timing test uses microbenchmark with 1000 replications for each method.

fnc <- function(n) sample(c(1L, -1L), n, replace=TRUE)
library(microbenchmark)
microbenchmark(all=cumsum(fnc(1000L)),
      reduce=Reduce("+", fnc(1000L), accumulate=TRUE),
      laplyRpCln=cumsum(unlist(lapply(rep.int(1L, 1000L), fnc))),
      laplyRpAn=cumsum(unlist(lapply(rep.int(1L, 1000L), function(x) fnc(1L)))),
      laplySqAn=cumsum(unlist(lapply(seq_len(1000L), function(x) fnc(1L)))),
      saplyRpCln=cumsum(sapply(rep.int(1L, 1000L), fnc)),
      saplyRpAn=cumsum(sapply(rep.int(1L, 1000L), function(x) fnc(1L))),
      saplySqAn=cumsum(sapply(seq_len(1000L), function(x) fnc(1L))),
      vaplyRpCln=cumsum(vapply(rep.int(1L, 1000L), fnc, FUN.VALUE=0)),
      vaplyRpAn=cumsum(vapply(rep.int(1L, 1000L), function(x) fnc(1L), FUN.VALUE=0)),
      vaplySqAn=cumsum(vapply(seq_len(1000L), function(x) fnc(1L), FUN.VALUE=0)),
      replicate=cumsum(replicate(1000L, fnc(1L))),
      forPre={vals <- numeric(1000L); for(i in seq_along(vals)) vals[i] <- fnc(1L); cumsum(vals)},
      forNoPre={vals <- numeric(0L); for(i in seq_len(1000L)) vals <- c(vals, fnc(1L)); cumsum(vals)},
      times=1000)

Here,

"all" uses the suggestion of Jake Burkhead, cumsum and pulling the sample all at once.
"reduce" pulls the sample at once, but uses Reduce to perform the summation.
laplyRpCln uses lapply and unlist to return a vector and iterates through 1000 instances of 1, calling the function directly by name.
laplyRpAn differs in using an anonymous function.
laplySqAn uses an anonymous function and creates the iterating variable with seq rather than rep.
saplyRpCln, laplyRpAn, laplySqAn are the same as laplyRpCln, etc except that sapply is called instead of lapply/unlist.
vaplyRpCln, etc are the same as laplyRpCln, etc except that vapply is used in place of lapply/unlist.
replicate is a call to replicate, where the default is simplify=TRUE.
forPre uses a for loop that pre-allocates the vector and the fills it in.
forNoPre uses a for loop that creates an empty numeric(0) vector and then uses c to concatenate to that vector.

This returns

Unit: microseconds
       expr      min         lq        mean     median         uq      max neval     cld
        all   25.634    31.0705    85.66495    33.6890    35.3400 49240.30  1000 a      
     reduce  542.073   646.7720   780.13592   696.4775   750.2025 51685.44  1000  b     
 laplyRpCln 4349.384  5026.4015  6433.60754  5409.2485  7209.3405 58494.44  1000   c e  
  laplyRpAn 4600.200  5281.6190  6513.58733  5682.0570  7488.0865 55239.04  1000   c e  
  laplySqAn 4616.986  5251.4685  6514.09770  5634.9065  7488.1560 54263.04  1000   c e  
 saplyRpCln 4362.324  5080.3970  6325.66531  5506.5330  7294.6225 59075.02  1000   cd   
  saplyRpAn 4701.140  5386.1350  6781.95655  5786.6905  7587.8525 55429.02  1000     e  
  saplySqAn 4651.682  5342.5390  6551.35939  5735.0610  7525.4725 55148.32  1000   c e  
 vaplyRpCln 4366.322  5046.0625  6270.66501  5482.8565  7208.0680 63756.83  1000   c    
  vaplyRpAn 4657.256  5347.2190  6724.35226  5818.5225  7580.3695 64513.37  1000    de  
  vaplySqAn 4623.897  5325.6230  6475.97938  5769.8130  7541.3895 14614.67  1000   c e  
  replicate 4722.540  5395.1420  6653.90306  5777.3045  7638.8085 59376.89  1000   c e  
     forPre 5911.107  6823.3040  8172.41411  7226.7820  9038.9550 56119.11  1000      f 
   forNoPre 8431.855 10584.6535 11401.64190 10910.0480 11267.5605 58566.27  1000       g

Notice that the first method is clearly the fastest. This is followed by pulling the full sample at once and then using Reduce to perform the summation. Among the *apply functions, the "clean" versions, using the name of the function directly seem to have a minor performance improvement, and the lapply version seems to be on par with vapply, but given the range of values, this conclusion is not entirely straightforward. sapply appears to be the slowest, though the method of function call dominates the type of *apply function.

The two for loops performed the worst, and the pre-allocation for loop outperforming for loop growing with c.

Here, I'm running a patched version of 3.4.1 (patched roughly August 23, 2017) on openSuse 42.1.

Please let me know if you see any mistakes and I'll fix them as soon as I can. Thanks to Ben Bolker for prodding me to investigate the final function more, where I found a couple of bugs.

Elis answered 27/8, 2017 at 0:10 Comment(2)

why is forNoPre 3x faster than forPre? That seems really fishy. – Fromenty 27/8, 2017 at 1:2

@BenBolker Thank you, this prompted a second look. I had neglected to store the results of each additional draw (c(vals, fnc(1L)) instead of vals <- c(vals, fnc(1L)) in the forNoPre runs. And an even worse bug where I was trying to seq_along a vector of length 0... – Elis 27/8, 2017 at 2:33

Here one way of doing it.

GenerateRandomWalk <- function(k = 250,initial.value = 0) {
  # Add a bionomial at each step
  samples = rbinom(k,1,0.5)
  samples[samples==0] = -1
  initial.value + c(0, cumsum(samples))
}

Jiggered answered 24/2, 2014 at 21:38 Comment(0)

Recommended topics

Hot tags