Finding stationary distribution of a markov process given a transition probability matrix

N

3

8

There has been two threads related to this issue on Stack Overflow:

How can I obtain stationary distribution of a Markov Chain given a transition probability matrix describes what a transition probability matrix is, and demonstrate how a stationary distribution is reached by taking powers of this matrix;
How to find when a matrix converges with a loop uses an R loop to determine when the matrix power converges.

The above is straightforward, but very expensive. If we have a transition matrix of order n, then at each iteration we compute a matrix-matrix multiplication at costs O(n ^ 3).

Is there a more efficient way to do this? One thing that occurs to me is to use Eigen decomposition. A Markov matrix is known to:

be diagonalizable in complex domain: A = E * D * E^{-1};
have a real Eigen value of 1, and other (complex) Eigen values with length smaller than 1.

The stationary distribution is the Eigen vector associated with the Eigen value of 1, i.e., the first Eigen vector.

Well, the theory is nice, but I can't get it work. Taking the matrix P in the first linked question:

P <- structure(c(0, 0.1, 0, 0, 0, 0, 0, 0.1, 0.2, 0, 0, 0, 0, 0, 0.2, 
0.3, 0, 0, 0.5, 0.4, 0.3, 0.5, 0.4, 0, 0, 0, 0, 0, 0.6, 0.4, 
0.5, 0.4, 0.3, 0.2, 0, 0.6), .Dim = c(6L, 6L))

If I do:

Re(eigen(P)$vectors[, 1])
# [1] 0.4082483 0.4082483 0.4082483 0.4082483 0.4082483 0.4082483

What's going on? According to previous questions, the stationary distribution is:

# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708

Naamana answered 12/1, 2017 at 17:17 Comment(0)

L

2

Your vector y = Re(eigen(P)$vectors[, 1]) is not a distribution (since it doesn't add up to one) and solves P'y = y, not x'P = x. The solution from your linked Q&A does approximately solve the latter:

x = c(0.00259067357512953, 0.0259067357512953, 0.116580310880829, 
0.310880829015544, 0.272020725388601, 0.272020725388601)
all(abs(x %*% P - x) < 1e-10) # TRUE

By transposing P, you can use your eigenvalue approach:

x2 = Re(eigen(t(P))$vectors[, 1])
x2 <- x2/sum(x2) 
(function(x) all(abs(x %*% P - x) < 1e-10))(
  x2
) # TRUE

It's finding a different stationary vector in this instance, though.

Less answered 12/1, 2017 at 17:44 Comment(3)

Thanks for posting answer. I was asked on this a few hours ago and would like to post it as a question & answer to share with people. But I've decided to leave a time gap of 30 mins to see whether someone else earns the bonus. I think you forget to normalize the Eigen vector. Eigen vector has length / norm 1, but not summing up to 1. – Naamana 12/1, 2017 at 17:56

Thanks. Wish I'd known you were planning a self-answer, but it didn't take much of my time and your answer looks useful; I've starred it for later reference. Thanks for the Q&A – Less 12/1, 2017 at 18:6

Hm, thanks :) but I think yours is more useful and deserves the accept, though it's up to you. I just addressed the basic programming and mathematical definitions here, while your answer goes into more detail on both. – Less 12/1, 2017 at 18:13

N

7

Well, to use Eigen decomposition, we need to work with t(P).

The definition of a transition probability matrix differs between probability / statistics and linear algebra. In statistics all rows of P sum to 1, while in linear algebra, all columns of P sum to 1. So instead of eigen(P), we need eigen(t(P)):

e <- Re(eigen(t(P))$vectors[, 1])
e / sum(e)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708

As we can see, we've only used the first Eigen vector, i.e., the Eigen vector of the largest Eigen value. Therefore, there is no need to compute all Eigen values / vectors using eigen. The power method can be used to find an Eigen vector of the largest Eigen value. Let's implement this in R:

stydis1 <- function (A) {
  n <- dim(A)[1L]
  ## checking
  if (any(.rowSums(A, n, n) != 1)) 
    stop (" 'A' is not a Markov matrix")
  ## implement power method
  e <- runif (n)
  oldnorm <- sqrt(c(crossprod(e)))
  repeat {
    e <- crossprod(A, e)
    newnorm <- sqrt(c(crossprod(e)))
    if (abs(newnorm / oldnorm - 1) < 1e-8) break
    e <- e / newnorm
    oldnorm <- newnorm
    }
  ## rescale `e` so that it sums up to 1
  c(e / sum(e))
  }

stydis1 (P)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708

And the result is correct.

In fact, we don't have to exploit Eigen decomposition. We can adjust the method used in your second linked question. Over there, we took matrix power which is expensive as you commented; but why not re-cast it into a matrix-vector multiplication?

stydis2 <- function (A) {
  n <- dim(A)[1L]
  ## checking
  if (any(.rowSums(A, n, n) != 1)) 
    stop (" 'A' is not a Markov matrix")
  ## direct computation
  b <- A[1, ]
  oldnorm <- sqrt(c(crossprod(b)))
  repeat {
    b <- crossprod(A, b)
    newnorm <- sqrt(c(crossprod(b)))
    if (abs(newnorm / oldnorm - 1) < 1e-8) break
    oldnorm <- newnorm
    }
  ## return stationary distribution
  c(b)
  }

stydis2 (P)
# [1] 0.002590673 0.025906737 0.116580322 0.310880848 0.272020713 0.272020708

We start from an arbitrary initial distribution, say A[1, ], and iteratively apply transition matrix until the distribution converges. Again, the result is correct.

Naamana answered 12/1, 2017 at 17:47 Comment(1)

If I'm not wrong, this is the way to compute the stationary distribution of a discrete-time Markov chain. In case of continuous-time Markov chain, we can use LU-decomposition, for instance. – Role 12/1, 2017 at 23:49

L

2

Your vector y = Re(eigen(P)$vectors[, 1]) is not a distribution (since it doesn't add up to one) and solves P'y = y, not x'P = x. The solution from your linked Q&A does approximately solve the latter:

x = c(0.00259067357512953, 0.0259067357512953, 0.116580310880829, 
0.310880829015544, 0.272020725388601, 0.272020725388601)
all(abs(x %*% P - x) < 1e-10) # TRUE

By transposing P, you can use your eigenvalue approach:

x2 = Re(eigen(t(P))$vectors[, 1])
x2 <- x2/sum(x2) 
(function(x) all(abs(x %*% P - x) < 1e-10))(
  x2
) # TRUE

It's finding a different stationary vector in this instance, though.

Less answered 12/1, 2017 at 17:44 Comment(3)

Thanks for posting answer. I was asked on this a few hours ago and would like to post it as a question & answer to share with people. But I've decided to leave a time gap of 30 mins to see whether someone else earns the bonus. I think you forget to normalize the Eigen vector. Eigen vector has length / norm 1, but not summing up to 1. – Naamana 12/1, 2017 at 17:56

Thanks. Wish I'd known you were planning a self-answer, but it didn't take much of my time and your answer looks useful; I've starred it for later reference. Thanks for the Q&A – Less 12/1, 2017 at 18:6

Hm, thanks :) but I think yours is more useful and deserves the accept, though it's up to you. I just addressed the basic programming and mathematical definitions here, while your answer goes into more detail on both. – Less 12/1, 2017 at 18:13

R

0

By the definition of the stationary probability vector, it is a left-eigenvector of the transition probability matrix with unit eigenvalue. We can find objects of this kind by computing the eigendecomposition of the matrix, identifying the unit eigenvalues and then computing the stationary probability vectors for each of these unit eigenvalues. Here is a function in R to do this.

stationary <- function(P) {
  
  #Get matrix information
  K     <- nrow(P)
  NAMES <- rownames(P)
  
  #Compute the eigendecomposition
  EIGEN <- eigen(P)
  VALS  <- EIGEN$values
  RVECS <- EIGEN$vectors
  LVECS <- solve(VECS)
  
  #Find the unit eigenvalue(s)
  RES <- zapsmall(Mod(VALS - as.complex(rep(1, K))))
  IND <- which(RES == 0)
  N   <- length(IND)
  
  #Find the stationary vector(s)
  OUT <- matrix(0, nrow = N, ncol = K)
  rownames(OUT) <- sprintf('Stationary[%s]', 1:N)
  colnames(OUT) <- NAMES
  for (i in 1:length(IND)) { 
    SSS     <- Re(eigen(t(P))$vectors[, IND[i]])
    OUT[i,] <- SSS/sum(SSS) }
  
  #Give the output
  OUT }

(Note: The computed eigendecomposition using eigen is subject to some numerical error, so there is no eigenvalue that is exactly equal to one. For this reason we zapsmall the modular deviation from one to identify the unit eigenvector(s). This will give us the correct answer so long as there is no true eigenvalue that is less than one, but so close to one that it also gets "zapped" to one.)

Applying this function to your transition probability matrix correctly identifies the unique stationary probability vector in this case. There is a small amount of numerical error in the computation, but this should be manageable in most cases.

#Compute the stationary probability vector
S <- stationary(P)

#Show this vector and confirm stationarity
S
                     [,1]       [,2]      [,3]      [,4]      [,5]      [,6]
Stationary[1] 0.002590674 0.02590674 0.1165803 0.3108808 0.2720207 0.2720207

S %*% P
                     [,1]       [,2]      [,3]      [,4]      [,5]      [,6]
Stationary[1] 0.002590674 0.02590674 0.1165803 0.3108808 0.2720207 0.2720207

#Show error in computation
c(S %*% P - S)
[1]  4.336809e-17  2.775558e-17  1.110223e-16 -2.775558e-16  1.665335e-16 -5.551115e-17

Ryun answered 26/10, 2020 at 10:2 Comment(0)

Recommended topics

Hot tags