As I understand it cox.zph
is a test as to whether a covariate should enter the model as independent of time. If you already know that your predictor is time-dependent then this does not seem to be the appropriate approach. I'm not aware of an easy way to go about this and such a question may find a more receptive audience on Cross Validated.
For a reproducible example, we can use that from Therneau:
library(survival)
veteran$celltype <- relevel(veteran$celltype, ref="adeno")
f1 <- coxph(Surv(time, status) ~
trt + celltype + karno + diagtime + age + prior,
data=veteran)
(z1 <- cox.zph(f1, transform="log"))
rho chisq p
trt -0.01561 0.0400 0.841486
celltypesquamous -0.16278 3.8950 0.048431
celltypesmallcell -0.11908 2.2199 0.136238
celltypelarge 0.00942 0.0121 0.912551
karno 0.29329 11.8848 0.000566
diagtime 0.11317 1.6951 0.192930
age 0.20984 6.5917 0.010245
prior -0.16683 3.9873 0.045844
GLOBAL NA 27.5319 0.000572
rho is Pearson's correlation between the scaled Shoenfeld residuals and g(t) where g is a function of time (default is the Kaplan-Meier scale; here we are using log
, as you can see on the scale of the x axis in the plot below).
If the variable is time-invariant then the slope of the plotted line should be zero. This is essentially what chisq tests.
Update @Didi Ingabire - in light of your comments:
Thus a low p-value indicates:
- the Schoenfeld residuals are not constant over time
- there is evidence that the variable/predictor may be time-dependent
- the proportional-hazards assumption (made when generating the
coxph
model) may be violated by this variable
You can see this visually like so:
for (i in 1:(nrow(z1$table)-1)){
plot(z1[i], main="Scaled Schoenfeld residuals by time with smooth spline
If <0 indicates protective effect")
graphics::abline(a=0, b=0, col="black")
}
which gives e.g.:
Update @JMarcelino This is to say that cox.zph
is a test of the final form of the model, to ensure that the residuals are relatively constant over time.
If one of the variables is already a function of time (when it enters the model), this won't affect the test. In fact it should be more likely to produce a flat line with a high p-value if the influence of time is modeled correctly.
Also, testing proportional hazards means testing is the hazard ratio constant over time?. Whether the variable is time-dependent or not (when it enters the model) is unimportant. What is being tested is the final form of the model.
For example, instead of karno
we can enter a variable which is related to both it and to time like so:
f2 <- coxph(Surv(time, status) ~
trt + celltype + log(karno * time) + diagtime + age + prior,
data=veteran)
(z2 <- cox.zph(f2, transform="log"))
rho chisq p
trt 0.0947 1.4639 0.226
celltypesquamous -0.0819 1.1085 0.292
celltypesmallcell -0.0897 1.3229 0.250
celltypelarge 0.0247 0.0968 0.756
log(karno * time) -0.0836 0.6347 0.426
diagtime 0.0463 0.2723 0.602
age 0.0532 0.3493 0.554
prior -0.0542 0.3802 0.538
GLOBAL NA 7.6465 0.469
This gives us a model which better fits the proportional hazards assumption. However the interpretation of the coefficient log(karno * time)
is not particularly intuitive and unlikely to be of great practical value.