How to do simple survival analysis with pymc3 (Weibull distribution regression)?
Asked Answered
I

1

6

I'm new to using pymc3, I've read Bayesian Methods for Hackers and done my best to work through existing survival analysis tutorials in pymc3. However, I don't understand how to write/interpret the "survival function".

For this problem, I've generated some dummy data from a Weibull Distribution defined by NIST here:

n = 1000
alpha = 1 
gam = 0.5
mu = 0
noise = np.random.normal(0, 0.025, [n, 1])

x = np.random.rand(n, 1)*10
f_x = (gam/alpha)*(((gam-mu)/alpha)**(gam-1))*np.exp(-((x-mu)/alpha)**gam)
y = f_x + noise

Since I want to create a model with censored data and uncensored data like the pymc3 tutorial for Bayesian Parametrization, I implemented a cutoff and censored those data points:

cens = np.array([1 if k < 7.5 else 0 for k in x])

Then I began to establish my model with priors:

with pm.Model() as survival_model:

     alpha0 = pm.Normal('alpha0', mu=1, sigma = 1)
     gam0 = pm.Normal('gam0', mu=0.5, sigma = 1)
     mu0 = pm.Normal('mu0', mu=0.0, sigma = 1)
     noise0 = pm.Normal('noise0', mu=0.0, sigma = 0.05)

Now the trouble begins, I know I need to define a likelihood function that accounts for the censored values and take all the parameters as inputs to output the likelihood. I think for the censored values I have to find the equation to describe P(Y > y). Typically, I could use the CDF but in this case I've found using Matlab and Mathematica the indefinite integral does not exist. What should I do?

Insatiable answered 24/10, 2021 at 21:29 Comment(2)
Take a look at lifelines package github.com/CamDavidsonPilon/lifelines, maybe it has what you wantDenominate
I've perused the lifelines package previously. Unfortunately, it only uses 2 parameters for Weibull regression and is built for point estimation not Bayesian analysis.Insatiable
R
0

You can create a likelihood function in pymc3 with a custom distribution. In particular the pm.DensityDist class. I will use surpyval's pre-existing methods for the functions.

This requires the input to be the log-likelihood. For observed values the log-likelihood is the log of the density function. For censored observations the log-likelihood is actually just the negative of the cumulative hazard function of the distribution. Sum all these values and return it.

Full example below:

import pymc3 as pm
from surpyval import Weibull

# Create 100 random variables with alpha=50 and beta=2
rvs = Weibull.random(100, 50, 2)

# Set all values above 60 to be 60.. 
# i.e all above 60 are censored
rvs[rvs >= 60] = 60

# separate censored and observed
censored = rvs[rvs == 60]
failures = rvs[rvs < 60]

with pm.Model() as survival_model:
     alpha = pm.Normal('alpha', mu=100, sigma = 10)
     beta = pm.Gamma('beta', alpha=5, beta = 2)
     
     # Where the magic happens
     def logp(failures, censored):
          # likelihood for observed is the log of the density function
          failed_log_like = Weibull.log_df(failures, alpha, beta)
          # The log-likelihood for censored observations is the negative
          # of the cumulative hazard function.
          censored_log_like = -Weibull.Hf(censored, alpha, beta)
          # Return the sum of all of it
          return censored_log_like.sum() + failed_log_like.sum()

     weibull_neg_ll = pm.DensityDist('weibull_neg_ll', 
                                     logp,
                                     observed={'failures' : failures,
                                               'censored' : censored})
     start = pm.find_MAP()
     step = pm.NUTS()
     trace = pm.sample(10000, step, start, random_seed=123, progressbar=True)
     pm.traceplot(trace)

Rosinweed answered 9/7, 2022 at 7:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.