probability of survival at particular time points using randomForestSRC
Asked Answered
A

1

6

I'm using rfsrc to model a survival problem, like this:

library(OIsurv)
library(survival)
library(randomForestSRC)

data(burn)
attach(burn)

library(randomForestSRC)

fit <- rfsrc(Surv(T1, D1) ~  ., data=burn)

# predict on the train set
pred <- predict(fit, burn, OOB=TRUE, type=response)
pred$predicted

this gives me the overall survival probability of all patients.

How do I get the survival probability for each person for different timepoints, say 0-5 months or 0-10 months?

Along answered 9/8, 2015 at 12:37 Comment(1)
I noticed the "pred$predicted" can be > 100. So it should not be "overall survival probability of all patients". Can anyone tell what is it for survival model?Anjanette
F
10

The documentation on this isn't immediately obvious if you aren't familiar with the package, but it is possible.

Load data

data(pbc, package = "randomForestSRC")

Create trial and test datasets

pbc.trial <- pbc %>% filter(!is.na(treatment))
pbc.test <- pbc %>% filter(is.na(treatment))

Build our model

rfsrc_pbc <- rfsrc(Surv(days, status) ~ .,
                   data = pbc.trial,
                   na.action = "na.impute")

Test out model

test.pred.rfsrc <- predict(rfsrc_pbc, 
                           pbc.test,
                           na.action="na.impute")

All of the good stuff is held within our prediction object. The $survival object is a matrix of n rows (1 per patient) and n columns (one per time.interest - these are automatically chosen though you can constrain them using the ntime argument. Our matrix is 106x122)

test.pred.rfsrc$survival

The $time.interest object is a list of the different "time.interests" (122, same as the number of columns in our matrix from $surival)

test.pred.rfsrc$time.interest

Let's say we wanted to see our predicted status at 5 years, we would
need to figure out which time interest was closest to 1825 days (since our measurement period is days) when we look at our $time.interest object, we see that row 83 = 1827 days or roughly 5 years. row 83 in $time.interest corresponds to column 83 in our $survival matrix. Thus to see the predicted probability of survival at 5 years we would just look at column 83 of our matrix.

test.pred.rfsrc$survival[,83]

You could then do this for whichever timepoints you're interested in.

Fluoroscope answered 10/8, 2015 at 5:25 Comment(5)
I think the charge that this is "really poorly documented" is unfair. See the last example on the ?predict.rfsrc. Uses cumulative hazard function to generate a survival curve: exp(-pred.fit$chf)Pronounced
@BondedDust, you're right. I've updated my post in response.Fluoroscope
@Fluoroscope quick question on this topic. Do the values in time.interest variable correspond to total time or the time an observation will survive from here on out? "here on out" being the time when the survival algorithm was run.Eaddy
I have problems using factors in the model. It doesn't run at all and seems very buggy.Okajima
great answer! can you please take a look at my question if you have time? #66684792 thank youMinette

© 2022 - 2024 — McMap. All rights reserved.