This is essentially a 3 part question:
- How to estimate time-varying effects?
- What is the difference between different specifications of time-varying effects using
survival::coxph
function
- How to decide what shape the time-variation has, i.e., linear, logarithmic, ...
I will try to answer these questions in the following using the veteran data example, which is featured in section 4.2 of the vignette on time-dependent covariates and time-dependent coefficients (also known as time-varying effects) in the survival
package:
library(dplyr)
library(survival)
data("veteran", package = "survival")
veteran <- veteran %>%
mutate(
trt = 1L * (trt == 2),
prior = 1L * (prior == 10))
head(veteran)
#> trt celltype time status karno diagtime age prior
#> 1 0 squamous 72 1 60 7 69 0
#> 2 0 squamous 411 1 70 5 64 1
#> 3 0 squamous 228 1 60 3 38 0
#> 4 0 squamous 126 1 60 9 63 1
#> 5 0 squamous 118 1 70 11 65 1
#> 6 0 squamous 10 1 20 5 49 0
1. How to estimate time-varying effects
There are different popular methods and implementations, e.g. survival::coxph
, timereg::aalen
or using GAMs after appropriate data transformation (see below).
Although the specific methods and their implementaitons differ, a general idea ist to create a long form data set where
- the follow-up is partitioned into intervals
- for each subject, the status is 0 in all intervals except the last (if an event)
- the time variable is updated in each interval
Then, the time (or a transformation of time, e.g. log(t)) is simply a covariate and time-varying effects can be estimated by an interaction between the covariate of interest and the (transformed) covariate of time.
If the functional form of the time-variation is known, you can use the tt()
aproach:
cph_tt <- coxph(
formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran,
tt = function(x, t, ...) x * log(t + 20))
2. What is the difference between different specifications of time-varying effects using survival::coxph
function
There is no difference. I assume the tt()
function is simply a short-cut for the estimation via transformation to the long-format. You can verify that the two approaches are equivalent using the code below:
transform to long format
veteran_long <- survSplit(Surv(time, status)~., data = veteran, id = "id",
cut = unique(veteran$time)) %>%
mutate(log_time = log(time + 20))
head(veteran_long) %>% select(id, trt, age, tstart, time, log_time, status)
#> id trt age tstart time log_time status
#> 1 1 0 69 0 1 3.044522 0
#> 2 1 0 69 1 2 3.091042 0
#> 3 1 0 69 2 3 3.135494 0
#> 4 1 0 69 3 4 3.178054 0
#> 5 1 0 69 4 7 3.295837 0
#> 6 1 0 69 7 8 3.332205 0
cph_long <- coxph(formula = Surv(tstart, time, status)~
trt + prior + karno + karno:log_time, data = veteran_long)
## models are equivalent, just different specification
cbind(coef(cph_long), coef(cph_tt))
#> [,1] [,2]
#> trt 0.01647766 0.01647766
#> prior -0.09317362 -0.09317362
#> karno -0.12466229 -0.12466229
#> karno:log_time 0.02130957 0.02130957
3. How to decide what shape the time-variation has?
As mentioned before, time-varying effects are simply interactions of a covariate x
and time t
, thus time-varying effects can have different specifications, equivalent to interactions in standard regression models, e.g.
x*t
: linear covariate effect, linearly time-varying effect
f(x)*t
: non-linear covariate effect, linearly time-varying effect
f(t)*x
: linear covariate effect, non-linearly time-varying (for categorical x) this essentially represents a stratified baseline hazard
f(x, t)
: non-linear, non-linearly time-varying effect
In each case, the functional form of the effect f
can either be estimated from the data or prespecified (e.g. f(t)*x = karno * log(t + 20)
above).
In most cases you would prefer to estimate f
from the data. The support for the (penalized) estimation of such effects is to my knowledge limited in the survival
package. However, you can use mgcv::gam
to estimate any of the effects specified above (after appropriate data transformation). An example is given below and shows that the effect of karno
goes towards 0 as time progresses, regardless of the Karnofsky score at the beginning of the follow-up (see here for details and also Section 4.2 here):
library(pammtools)
# data transformation
ped <- as_ped(veteran, Surv(time, status)~., max_time = 400)
# model
pam <- mgcv::gam(ped_status ~ s(tend) + trt + prior + te(tend, karno, k = 10),
data = ped, family = poisson(), offset = offset, method = "REML")
p_2d <- gg_tensor(pam)
p_slice <- gg_slice(ped, pam, "karno", tend = unique(tend), karno = c(20, 50, 80), reference = list(karno = 60))
gridExtra::grid.arrange(p_2d, p_slice, nrow = 1)