How to predict survival time in Cox's Regression Model in R?
Asked Answered
G

2

5

I have a modeled a problem using Cox's regression and now want to predict the estimated survival time for an individual. The model has a list of covariates on which the survival time depends. This tells us how to calculate P(T>t) which is basically the survival Function (1-CDF) for a given individual.

I want to predict something which is slightly different. Given values for the covariates that have been used, I want to predict the the estimated number of days that the person would live. This, according to me, is similar to sampling from the pdf. How can I do this using the survival package in R? Below is a summary of the fit using Cox's regression model.

Call:
coxph(formula = Surv(Time, death) ~ variable1 + variable2 + variable3 + 
variable4 + variable5 + variable6 + variable7 + variable8 + variable9, 
data = DataTest, method = "breslow")

n= 23756, number of events= 23756 

          coef exp(coef) se(coef)      z Pr(>|z|)    
variable1  0.02494   1.02526  0.02375  1.050  0.29354    
variable2 -0.20715   0.81290  0.02395 -8.650  < 2e-16 ***
variable3  0.12940   1.13814  0.02263  5.717 1.08e-08 ***
variable4  0.02469   1.02500  0.02289  1.079  0.28077    
variable5  0.13165   1.14070  0.02235  5.891 3.84e-09 ***
variable6  0.22286   1.24965  0.01534 14.526  < 2e-16 ***
variable7 -0.10513   0.90021  0.02035 -5.167 2.38e-07 ***
variable8  -0.12215   0.88501  0.02243 -5.447 5.13e-08 ***
variable9  -0.04930   0.95189  0.01827 -2.698  0.00697 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      exp(coef) exp(-coef) lower .95 upper .95
variable1    1.0253     0.9754    0.9786    1.0741
variable2    0.8129     1.2302    0.7756    0.8520
variable3    1.1381     0.8786    1.0888    1.1898
variable4    1.0250     0.9756    0.9800    1.0720
variable5    1.1407     0.8767    1.0918    1.1918
variable6    1.2496     0.8002    1.2126    1.2878
variable7    0.9002     1.1109    0.8650    0.9368
variable8    0.8850     1.1299    0.8470    0.9248
variable9    0.9519     1.0505    0.9184    0.9866

Concordance= 0.543  (se = 0.002 )
Rsquare= 0.022   (max possible= 1 )
Likelihood ratio test= 516.5  on 9 df,   p=0
Wald test            = 503.1  on 9 df,   p=0
Score (logrank) test = 505.1  on 9 df,   p=0
Granlund answered 13/2, 2015 at 2:58 Comment(2)
This is really a question about statistics, not about programming. Try looking here: stats.stackexchange.com/questions/79362/…Pecuniary
"Estimated number of days that the person would live" is just "life expectancy". It's not the sample from the survival function but rather the integral over time of the survival function. A lower bound up to the last death can easily be calculated from the results of predict. Since you have complete observations (no censoring), this is not a problem for you.Glutamine
A
10

Due to the censored nature of survival data, it is usually more useful to compute a median survival time instead of a mean expected survival time. You can very easily recover the median survival time for each person in your data by running the following:

survfit(cox.ph.model,newdata= DataTest)

Alika answered 13/2, 2015 at 4:15 Comment(4)
Thanks! That does point out the median. Two points : 1. My data has no censoring. In that case, is there a better way? 2. The link that MrFlick suggested says that survival times for particular individuals is not appropriate use of this tool. I do not see why. Shouldn't that be one of the most important uses of such a tool?Granlund
I agree with you. One of the most important use cases of the Cox model should be to build a survival curve for each person according to their respective covariates. It looks like at one time,the survfit function allowed for the parameter print.mean=T in order to retrieve the mean residual life by person. I am guessing that you will now need to estimate it by estimating the death function from the survival curve and integrating. There might be a function that does this in the emplik library. It looks to be a bit of a pain to get. However, I still do think the median is a better here.Alika
Okay. From what I read and found out, I think that estimation of the survival time is probably not a good idea with Cox's proportional Hazard model, since the baseline hazard is implicit. I think that if one is making the proportional hazard assumption. then it is probably better to consider a distribution (Weibull, exponential etc.) and then estimate the survival time.Granlund
The Cox PH is more robust as it doesn't make any assumptions about the distribution of the baseline hazard (it is a non-parametric estimate). I have actually found that the Cox PH model is much more robust than the AFT models. You can always use the strata call if you want mulitple baseline hazards for different groups in your data.Alika
N
-2

I do not think you can estimate the survival time of a single observation using Cox Proportional Hazards model. The model outputs hazard ratio as the output and is well suited understanding the effects of covariates on survival as it does not make any assumptions of the baseline hazard function. If you want to estimate the survival time for a single observation, you are better off using distributions such as Weibull or Exponential which will allow you to do that and which are a part of the Survival package.

Thanks,

Nodus answered 2/1, 2017 at 13:36 Comment(1)
The first sentence is just wrong ( since what was requested was not survival time of a single observation but rather for a single set of covariates.) ...and the rest of the answer is not an answer.Glutamine

© 2022 - 2024 — McMap. All rights reserved.