How does cox.zph deal with time-dependent covariates?

Asked 30/6, 2014 at 10:54 Answered 19/8, 2014 at 17:32

I have a coxph model with 5 time-dependent and 2 time-independent variables. I want to test the proportional hazards assumption and besides martingale and deviance residuals, using cox.zph. My question is, how does this function deal with time-dependent covariates?

After reading Grant et al.,2014, I am not sure if this is the recommended goodness-of-fit test to assess the PH assumption for time-varying covariates.

Model:

teste<-coxph(Surv(tempo1,tempo2,status)~sexo+CODE_06+factor(clima)+TP_media7
             +ndvi+peso+epoca,data=newftable,na.action=na.fail)

> cox.zph(teste)
                         rho    chisq      p
sexoM                 0.0844  0.32363 0.5694
CODE_06Regadio        0.1531  0.66865 0.4135
CODE_06Sequeiro       0.2278  1.65735 0.1980
factor(clima)8       -0.1823  1.16522 0.2804
factor(clima)9        0.1051  0.24456 0.6209
factor(clima)15      -0.0193  0.00945 0.9226
TP_media7(12,22]      0.1689  0.75604 0.3846
TP_media7(22,32]      0.1797  1.03731 0.3084
TP_media7(32,41]      0.1060  0.34036 0.5596
ndvi(3e+03,4e+03]    -0.1595  1.00006 0.3173
ndvi(4e+03,5e+03]     0.0421  0.05233 0.8191
ndvi(5e+03,6e+03]     0.1750  0.98816 0.3202
ndvi(6e+03,8.05e+03] -0.0311  0.02880 0.8653
peso[850,1005]        0.2534  3.34964 0.0672
epocamid_inv_rep      0.0193  0.01219 0.9121
epocamid_pos_inv     -0.2193  0.93355 0.3339
epocamid_rep_pos      0.0231  0.01341 0.9078
epocapos_repr         0.2073  1.09893 0.2945
epocarepr             0.0766  0.12905 0.7194
GLOBAL                    NA 19.79229 0.4072

Yaron answered 30/6, 2014 at 10:54 Comment(0)

As I understand it cox.zph is a test as to whether a covariate should enter the model as independent of time. If you already know that your predictor is time-dependent then this does not seem to be the appropriate approach. I'm not aware of an easy way to go about this and such a question may find a more receptive audience on Cross Validated.

For a reproducible example, we can use that from Therneau:

library(survival)
veteran$celltype <- relevel(veteran$celltype, ref="adeno")
f1 <- coxph(Surv(time, status) ~
            trt + celltype + karno + diagtime + age + prior,
            data=veteran)
(z1 <- cox.zph(f1, transform="log"))

                       rho   chisq        p
trt               -0.01561  0.0400 0.841486
celltypesquamous  -0.16278  3.8950 0.048431
celltypesmallcell -0.11908  2.2199 0.136238
celltypelarge      0.00942  0.0121 0.912551
karno              0.29329 11.8848 0.000566
diagtime           0.11317  1.6951 0.192930
age                0.20984  6.5917 0.010245
prior             -0.16683  3.9873 0.045844
GLOBAL                  NA 27.5319 0.000572

rho is Pearson's correlation between the scaled Shoenfeld residuals and g(t) where g is a function of time (default is the Kaplan-Meier scale; here we are using log, as you can see on the scale of the x axis in the plot below). If the variable is time-invariant then the slope of the plotted line should be zero. This is essentially what chisq tests.

Update @Didi Ingabire - in light of your comments:

Thus a low p-value indicates:

the Schoenfeld residuals are not constant over time
there is evidence that the variable/predictor may be time-dependent
the proportional-hazards assumption (made when generating the coxph model) may be violated by this variable

You can see this visually like so:

for (i in 1:(nrow(z1$table)-1)){
    plot(z1[i], main="Scaled Schoenfeld residuals by time with smooth spline
If <0 indicates protective effect")
    graphics::abline(a=0, b=0, col="black")
}

which gives e.g.:

enter image description here

Update @JMarcelino This is to say that cox.zph is a test of the final form of the model, to ensure that the residuals are relatively constant over time.

If one of the variables is already a function of time (when it enters the model), this won't affect the test. In fact it should be more likely to produce a flat line with a high p-value if the influence of time is modeled correctly.

Also, testing proportional hazards means testing is the hazard ratio constant over time?. Whether the variable is time-dependent or not (when it enters the model) is unimportant. What is being tested is the final form of the model.

For example, instead of karno we can enter a variable which is related to both it and to time like so:

f2 <- coxph(Surv(time, status) ~
            trt + celltype + log(karno * time) + diagtime + age + prior,
            data=veteran)
(z2 <- cox.zph(f2, transform="log"))

                      rho  chisq     p
trt                0.0947 1.4639 0.226
celltypesquamous  -0.0819 1.1085 0.292
celltypesmallcell -0.0897 1.3229 0.250
celltypelarge      0.0247 0.0968 0.756
log(karno * time) -0.0836 0.6347 0.426
diagtime           0.0463 0.2723 0.602
age                0.0532 0.3493 0.554
prior             -0.0542 0.3802 0.538
GLOBAL                 NA 7.6465 0.469

This gives us a model which better fits the proportional hazards assumption. However the interpretation of the coefficient log(karno * time) is not particularly intuitive and unlikely to be of great practical value.

Civility answered 10/7, 2014 at 5:19 Comment(4)

I understand your point of view, but after asking the same question directly to Therneau, he answered: "The cox.zph function is fine for time-dependent covariates.". He didn't explained why, but now I am confused.. – Yaron 10/7, 2014 at 8:8

So you agree that cox.zph is an appropriate final form test for time-dependent covariates, but not to assess PH assumption? – Yaron 13/7, 2014 at 14:42

i am also confused, the time dependent variable violates the PH assumption, but when i assess the proportionality using cox.zph, it indicates that the TDV satisfies the PH assumption. – Quinnquinol 16/7, 2015 at 12:26

@Civility do you know if the plot function as in 'plot(z1[1])' give any flexibility to allow changing in color of points and line. I have so many points that I cannot see my line when plotting this graph. Thanks! – Metalloid 28/11, 2017 at 18:50

Its important to distinguish between time-dependent variable and a variable that does not meet the PH assumption.

A time-dependent variable is one that vary with time. This could be blood pressure; it will vary on different occasions. Sex (gender) will however not vary on different occasions.

Then there is a distinction between internal and external time-dependent variables:

• Internal time-dependent variables: are variables that vary because of changes within the individual (e.g blood pressure).

• External time-dependent variables: environmental/external changes that modify the hazard experienced by an individual (e.g as industries proliferate in a city, air pollution increases with time and so the hazard in the population increases for conditions such as myocardial infarction).

Regardless of the nature of a variable, fixed or time-dependent, it can violate the PH assumption. I could provide a few examples, but it is probably easier to just accept the fact that any variable might violate PH assumption. It can even if You try to accomodate the variations in an extended Cox model (e.g using multiple observations per individual in counting-process format).

The solution: you can enter any predictor into a Cox model and check if it fulfills the PH by the cox.zph function of Thernau's survival package. The corresponding statement in SAS would be the 'assess ph / resample' statement. If a variable violates PH, the (probably) simplest way to solve this is to introduce an interaction between that variable and your time variable.

Example follows. This is the Cox formula:

Survival = age + sex + blood_pressure

Lets say blood pressure violates PH --> Introduce the following term:

Survival = age + sex + blood_pressure*survival_time_variable

This should solve it but you cannot interpret the main effect of blood-pressure because that variable now depends om time.

Another solution is to stratify your model, but this would not be appropriate for a continuous variable and for categorical variables, once stratified, the variable is not included as a covariate in the resulting model (ie you wont be getting a hazard ratio).

Ironic answered 19/8, 2014 at 17:32 Comment(5)

My question is: If I know from start that my variable is time-dependent, can I test proportional hazards assumption with cox.zph()? Terry Therneau says:"The cox.zph function is fine for time-dependent covariates" – Yaron 28/8, 2014 at 10:29

Yes you can test it with cox.zph. – Ironic 29/8, 2014 at 7:9

Sure, if the author says so, I admit that it is true. But why is that? Doesn't make much sense to me to test if time-dependent variable do not change over time when I admit from start that it changes over time. – Yaron 29/8, 2014 at 11:4

I believe you should try to distinguish between: a) time-dependent variable means that the value of the variable with vary over time. Blood pressure on two occasions will not be the same, it is time-dependent! Gender and ethnicity are, on the contrary to blood pressure fixed. They do not change. b) testing the PH assumption (regardless of method) means that you are trying to figure out if the dangers with blood pressure (for example having a blood pressure of 150 units) is equally dangerous throughout the study. Or similarily, is it equally dangerous to be male throughout the study. – Ironic 29/8, 2014 at 14:9

Note that blood pressure changes over time, gender doesnt, but you have to assess PH assumption for both of these variables. This is due to the assumptions of Cox regression, implying that hazards (e.g hazard associated with being male) should be equal (proportional) throughout the study. – Ironic 29/8, 2014 at 14:9

Recommended topics

Hot tags