Bayesian Probabilistic Matrix Factorization (BPMF) with PyMC3: PositiveDefiniteError using `NUTS`
Asked Answered
H

1

6

I've implemented the Bayesian Probabilistic Matrix Factorization algorithm using pymc3 in Python. I also implemented it's precursor, Probabilistic Matrix Factorization (PMF). See my previous question for a reference to the data used here.

I'm having trouble drawing MCMC samples using the NUTS sampler. I initialize the model parameters using the MAP from PMF, and the hyperparameters using Gaussian random draws sprinkled around 0. However, I get a PositiveDefiniteError when setting up the step object for the sampler. I've verified that the MAP estimate from PMF is reasonable, so I expect it has something to do with the way the hyperparameters are being initialized. Here is the PMF model:

import pymc3 as pm
import numpy as np
import pandas as pd
import theano
import scipy as sp

data = pd.read_csv('jester-dense-subset-100x20.csv')    
n, m = data.shape
test_size = m / 10
train_size = m - test_size

train = data.copy()
train.ix[:,train_size:] = np.nan  # remove test set data
train[train.isnull()] = train.mean().mean()  # mean value imputation
train = train.values

test = data.copy()
test.ix[:,:train_size] = np.nan  # remove train set data
test = test.values    

# Low precision reflects uncertainty; prevents overfitting
alpha_u = alpha_v = 1/np.var(train)
alpha = np.ones((n,m)) * 2  # fixed precision for likelihood function
dim = 10  # dimensionality

# Specify the model.
with pm.Model() as pmf:
    pmf_U = pm.MvNormal('U', mu=0, tau=alpha_u * np.eye(dim),
                        shape=(n, dim), testval=np.random.randn(n, dim)*.01)
    pmf_V = pm.MvNormal('V', mu=0, tau=alpha_v * np.eye(dim),
                        shape=(m, dim), testval=np.random.randn(m, dim)*.01)
    pmf_R = pm.Normal('R', mu=theano.tensor.dot(pmf_U, pmf_V.T),
                      tau=alpha, observed=train)

    # Find mode of posterior using optimization
    start = pm.find_MAP(fmin=sp.optimize.fmin_powell)

And here is BPMF:

n, m = data.shape
dim = 10  # dimensionality
beta_0 = 1  # scaling factor for lambdas; unclear on its use
alpha = np.ones((n,m)) * 2  # fixed precision for likelihood function

logging.info('building the BPMF model')
std = .05  # how much noise to use for model initialization
with pm.Model() as bpmf:
    # Specify user feature matrix
    lambda_u = pm.Wishart(
        'lambda_u', n=dim, V=np.eye(dim), shape=(dim, dim),
        testval=np.random.randn(dim, dim) * std)
    mu_u = pm.Normal(
        'mu_u', mu=0, tau=beta_0 * lambda_u, shape=dim,
        testval=np.random.randn(dim) * std)
    U = pm.MvNormal(
        'U', mu=mu_u, tau=lambda_u, shape=(n, dim),
        testval=np.random.randn(n, dim) * std)

    # Specify item feature matrix
    lambda_v = pm.Wishart(
        'lambda_v', n=dim, V=np.eye(dim), shape=(dim, dim),
        testval=np.random.randn(dim, dim) * std)
    mu_v = pm.Normal(
        'mu_v', mu=0, tau=beta_0 * lambda_v, shape=dim,
         testval=np.random.randn(dim) * std)
    V = pm.MvNormal(
        'V', mu=mu_v, tau=lambda_v, shape=(m, dim),
        testval=np.random.randn(m, dim) * std)

    # Specify rating likelihood function
    R = pm.Normal(
        'R', mu=theano.tensor.dot(U, V.T), tau=alpha,
        observed=train)

# `start` is the start dictionary obtained from running find_MAP for PMF.
for key in bpmf.test_point:
    if key not in start:
        start[key] = bpmf.test_point[key]

with bpmf:
    step = pm.NUTS(scaling=start)

At the last line, I get the following error:

PositiveDefiniteError: Scaling is not positive definite. Simple check failed. Diagonal contains negatives. Check indexes [   0    2   ...  2206  2207  ]

As I understand it, I can't use find_MAP with models that have hyperpriors like BPMF. This is why I'm attempting to initialize with the MAP values from PMF, which uses point estimates for the parameters on U and V rather than parameterized hyperpriors.

Hasid answered 18/4, 2015 at 22:14 Comment(2)
Just to clarify, your PMF works fine- but your BPMF has the issue with the Wishart distribution, right? What did you end up going with? (3years later!)Arianaariane
@Arianaariane I ended up going with the suggestion of twiecki to use the LKJ correlation prior. See this gist for the final implementation.Hasid
S
4

Unfortunately the Wishart distribution is non-functional. I recently added a warning here: https://github.com/pymc-devs/pymc3/commit/642f63973ec9f807fb6e55a0fc4b31bdfa1f261e

See here for more discussions on this tricky distribution: https://github.com/pymc-devs/pymc3/issues/538

You could confirm that that's the source by fixing the covariance matrix. If that's the case, I'd try using the JKL prior distribution: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/LKJ_correlation.py

Saito answered 19/4, 2015 at 13:50 Comment(2)
Is this something that could be addressed with pymc3 by writing a customized Gibbs sampler?Hasid
Perhaps, but NUTS should be doing a much better job because of the heavy tails of the Wishart. If you wanted to get the Wishart to work (which would be a great contribution), I think we'd have to focus on transforming it to the N * (N + 1) / 2 unconstrained space and then project back. That's how stan does it and the manual "Transformations of Constrained Variables" should give some pointers. PyMC3 has pretty nifty support for adding transforms so once we know what to do exactly it should be pretty easy to implement.Saito

© 2022 - 2024 — McMap. All rights reserved.