Markov chain stationary distributions with scipy.sparse?

Asked 23/1, 2014 at 12:56 Answered 17/2, 2015 at 10:24

Solved python scipy sparse-matrix markov-chains

I have a Markov chain given as a large sparse scipy matrix A. (I've constructed the matrix in scipy.sparse.dok_matrix format, but converting to other ones or constructing it as csc_matrix are fine.)

I'd like to know any stationary distribution p of this matrix, which is an eigenvector to the eigenvalue 1. All entries in this eigenvector should be positive and add up to 1, in order to represent a probability distribution.

This means I want any solution for the system (A-I) p = 0, p.sum()=1 (where I=scipy.sparse.eye(*A.shape) is the idententy matrix), but (A-I) will not be of full rank, and even the whole system may be under-determined. In addition, eigenvectors with negative entries can be generated, which cannot be normalized to be valid probability distributions. Preventing negative entries in p would be nice.

Using scipy.sparse.linalg.eigen.eigs is not a solution: It does not permit specifying the additive constraint. (Normalizing does not help if the eigenvectors contain negative entries.) Furthermore, it deviates quite a bit from the true results, and sometimes has problems converging, behaving worse than scipy.linalg.eig. (Also, I used shift-invert mode, which improves finding the types of eigenvalues I want, but not their quality. If I don't use it, it's even more overkill, because I am only interested in one particular eigenvalue, 1.)
Converting to dense matrices and using scipy.linalg.eig is not a solution: In addition to the negative entry problem, the matrix is too large.
Using scipy.sparse.spsolve is not an obvious solution: The matrix is either not square (when combining the additive constraint and the eigenvector condition) or not of full rank (when trying to specify them separately in some way), sometimes neither.

Is there a good way to numerically get a stationary state of a Markov chain given as sparse matrix using python? If there is a way to get an exhaustive list (and possibly also nearly stationary states), that's appreciated, but not necessary.

Remise answered 23/1, 2014 at 12:56 Comment(0)

Several articles with summaries of possible approaches can be found with Google scholar, here's one: http://www.ima.umn.edu/preprints/pp1992/932.pdf

What's done below is a combination of the suggestion by @Helge Dietert above on splitting to strongly connected components first, and approach #4 in the paper linked above.

import numpy as np
import time

# NB. Scipy >= 0.14.0 probably required
import scipy
from scipy.sparse.linalg import gmres, spsolve
from scipy.sparse import csgraph
from scipy import sparse 


def markov_stationary_components(P, tol=1e-12):
    """
    Split the chain first to connected components, and solve the
    stationary state for the smallest one
    """
    n = P.shape[0]

    # 0. Drop zero edges
    P = P.tocsr()
    P.eliminate_zeros()

    # 1. Separate to connected components
    n_components, labels = csgraph.connected_components(P, directed=True, connection='strong')

    # The labels also contain decaying components that need to be skipped
    index_sets = []
    for j in range(n_components):
        indices = np.flatnonzero(labels == j)
        other_indices = np.flatnonzero(labels != j)

        Px = P[indices,:][:,other_indices]
        if Px.max() == 0:
            index_sets.append(indices)
    n_components = len(index_sets)

    # 2. Pick the smallest one
    sizes = [indices.size for indices in index_sets]
    min_j = np.argmin(sizes)
    indices = index_sets[min_j]

    print("Solving for component {0}/{1} of size {2}".format(min_j, n_components, indices.size))

    # 3. Solve stationary state for it
    p = np.zeros(n)
    if indices.size == 1:
        # Simple case
        p[indices] = 1
    else:
        p[indices] = markov_stationary_one(P[indices,:][:,indices], tol=tol)

    return p


def markov_stationary_one(P, tol=1e-12, direct=False):
    """
    Solve stationary state of Markov chain by replacing the first
    equation by the normalization condition.
    """
    if P.shape == (1, 1):
        return np.array([1.0])

    n = P.shape[0]
    dP = P - sparse.eye(n)
    A = sparse.vstack([np.ones(n), dP.T[1:,:]])
    rhs = np.zeros((n,))
    rhs[0] = 1

    if direct:
        # Requires that the solution is unique
        return spsolve(A, rhs)
    else:
        # GMRES does not care whether the solution is unique or not, it
        # will pick the first one it finds in the Krylov subspace
        p, info = gmres(A, rhs, tol=tol)
        if info != 0:
            raise RuntimeError("gmres didn't converge")
        return p


def main():
    # Random transition matrix (connected)
    n = 100000
    np.random.seed(1234)
    P = sparse.rand(n, n, 1e-3) + sparse.eye(n)
    P = P + sparse.diags([1, 1], [-1, 1], shape=P.shape)

    # Disconnect several components
    P = P.tolil()
    P[:1000,1000:] = 0
    P[1000:,:1000] = 0

    P[10000:11000,:10000] = 0
    P[10000:11000,11000:] = 0
    P[:10000,10000:11000] = 0
    P[11000:,10000:11000] = 0

    # Normalize
    P = P.tocsr()
    P = P.multiply(sparse.csr_matrix(1/P.sum(1).A))

    print("*** Case 1")
    doit(P)

    print("*** Case 2")
    P = sparse.csr_matrix(np.array([[1.0, 0.0, 0.0, 0.0],
                                    [0.5, 0.5, 0.0, 0.0],
                                    [0.0, 0.0, 0.5, 0.5],
                                    [0.0, 0.0, 0.5, 0.5]]))
    doit(P)

def doit(P):
    assert isinstance(P, sparse.csr_matrix)
    assert np.isfinite(P.data).all()

    print("Construction finished!")

    def check_solution(method):
        print("\n\n-- {0}".format(method.__name__))
        start = time.time()
        p = method(P)
        print("time: {0}".format(time.time() - start))
        print("error: {0}".format(np.linalg.norm(P.T.dot(p) - p)))
        print("min(p)/max(p): {0}, {1}".format(p.min(), p.max()))
        print("sum(p): {0}".format(p.sum()))

    check_solution(markov_stationary_components)


if __name__ == "__main__":
    main()

EDIT: spotted a bug --- csgraph.connected_components returns also purely decaying components, which need to be filtered out.

Payee answered 17/2, 2015 at 10:24 Comment(3)

I've used some parts of your code to develop a Markov chain module on github.com/gvanderheide/discreteMarkovChain – Bipolar 31/3, 2016 at 14:58

the link appears to be broken - it might be useful to use the name, authors and year of the paper as the visible text of the link – Harald 12/9, 2022 at 15:52

ima.umn.edu/preprints/… (Iterative Methods for Finding the Stationary Vector for Markov Chains - Dianne P. O'Leary 1992) – Harald 12/9, 2022 at 15:53

This is solving a possibly underspecified Matrix equation, and can therefore be done with scipy.sparse.linalg.lsqr. I do not know how to make sure all entries are positive, but apart from that, it's very simple.

import scipy.sparse.linalg
states = A.shape[0]

# I assume that the rows of A sum to 1.
# Therefore, In order to use A as a left multiplication matrix,
# the transposition is necessary.
eigvalmat = (A - scipy.sparse.eye(states)).T
probability_distribution_constraint = scipy.ones((1, states))

lhs = scipy.sparse.vstack(
    (eigvalmat,
     probability_distribution_constraint))

B = numpy.zeros(states+1)
B[-1]=1

r = scipy.sparse.linalg.lsqr(lhs, B)
# r also contains metadata about the approximation process
p = r[0]

Remise answered 13/2, 2015 at 10:23 Comment(0)

Addressing the point that the stationary solutions are not unique and that a solution might not be non-negative.

This means that your Markov chain is not irreducible and that you could split the problem into irreducible Markov chains. For this you want to find the closed communicating classes of your Markov chain, which is essentially a study of the connected components of your transition graph (Wikipedia suggests some linear algorithms to find the strongly connected components). Moreover, you can through away all open communicating classes as every stationary state must vanish on these.

If you have your closed communicating classes C_1,...,C_n your problem is hopefully split into small simple pieces: The Markov chain on each closed class C_i is now irreducible, so that the restricted transition matrix M_i has exactly one eigenvector with eigenvalue 0 and this eigenvector has only positive components (see Perron-Frobenius Theorem). Thus we have exactly one stationary state x_i.

The stationary states of your full Markov chain are now all linear combinations of the x_i from your closed classes. In fact, these are all stationary states.

In order to find the stationary states x_i you can successively apply M_i and the iteration will converge to this state (this will also preserve your normalisation). In general, it is hard to tell the rate of convergence, but it gives you an easy way to improve the accuracy of your solution and to verify the solution.

Turnstone answered 13/2, 2015 at 10:45 Comment(0)

-1

Use power iteration (for example): http://www.google.com/search?q=power%20iteration%20markov%20chain

Or, you can use the shift-invert mode of scipy.sparse.linalg.eig (which is ARPACK) to look for eigenvalues close to 1. "Specifying" normalization is not necessary, as you can just normalize the values afterward.

Payee answered 23/1, 2014 at 14:6 Comment(1)

I used shift-invert mode, which improves finding the types of eigenvalues I want, but not their quality. Also, just normalizing eigenvectors does not work, because they might contain negative entries, which I have to solve away by finding the right linear combinations between eigenvectors – problematic if the eigenvectors are imprecise. – Remise 25/1, 2014 at 19:5

Recommended topics

Hot tags