Diagnose
From the error message:
2 survfit(mod_init, newdata = base_case)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
the problem is clearly not with coxph
during model fitting, but with survfit
.
And from this message:
10 eval(predvars, data, env)
9 model.frame.default(formula = Surv(start, stop, event) ~ rx +
number, data = data)
I can tell that the problem is that during early stage of survfit
, the function model.frame.default()
can not find a model frame containing relevant data used in formula Surv(start, stop, event) ~ rx + number
. Hence it complains.
What is a model frame?
A model frame, is formed from the data
argument passed to fitting routine, like lm()
, glm()
and mgcv:::gam()
. It is a data frame with the same number of rows as data
, but:
- dropping all variables not referenced by
formula
- adding many attributes, the most important of which is
envrionement
Most model fitting routines, like lm()
, glm()
, and mgcv:::gam()
, will keep the model frame in their fitted object by default. This has advantage that if we later call predict
, and no newdata
is provided, it will find data from this model frame for evaluation. However, a clear disadvantage is that it will substantially increase the size of your fitted object.
However, survival:::coxph()
is an exception. It will by default not retain such model frame in their fitted object. Well, clearly, this makes the resulting fitted object much smaller in size, but, expose you to the problem you have encountered. If we want to ask survival:::coxph()
to keep this model frame, then use model = TRUE
of this function.
Test with survial:::coxph()
library(survival); data(bladder)
my_function <- function(myformula, mydata, keep.mf = TRUE) {
fit <- coxph(myformula, mydata, method = "breslow", model = keep.mf)
survfit(fit)
}
Now, this function call will fail, as you have seen:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = FALSE)
but this function call will succeed:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = TRUE)
Same behaviour for lm()
We can actually demonstrate the same behaviour in lm()
:
## generate some toy data
foo <- data.frame(x = seq(0, 1, length = 20), y = seq(0, 1, length = 20) + rnorm(20, 0, 0.15))
## a wrapper function
bar <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit)
}
Now this will succeed, by keeping model frame:
bar(y ~ x - 1, foo, keep.mf = TRUE)
while this will fail, by discarding model frame:
bar(y ~ x - 1, foo, keep.mf = FALSE)
Using argument newdata
?
Note that my example for lm()
is slightly artificial, because we can actually use newdata
argument in predict.lm()
to get through this problem:
bar1 <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit, newdata = lapply(mydata, mean))
}
Now whether we keep model frame, both will succeed:
bar1(y ~ x - 1, foo, keep.mf = TRUE)
bar1(y ~ x - 1, foo, keep.mf = FALSE)
Then you may wonder: can we do the same for survfit()
?
survfit()
is a generic function, in your code, you are really calling survfit.coxph()
. There is indeed a newdata
argument for this function. The documentation reads:
newdata:
a data frame with the same variable names as those that appear in the
‘coxph’ formula. ... ... Default is the mean of the covariates used in the
‘coxph’ fit.
So, let's try:
my_function1 <- function(myformula, mydata) {
mtrace.off()
fit <- coxph(myformula, mydata, method = "breslow")
survival:::survfit.coxph(fit, newdata = lapply(mydata, mean))
}
and we hope this work:
my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
But:
Error in is.data.frame(data) (from #5) : object 'mydata' not found
1: my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
2: #5: survival:::survfit.coxph(fit, lapply(mydata, mean))
3: stats::model.frame(object)
4: model.frame.coxph(object)
5: eval(temp, environment(formula$terms), parent.frame())
6: eval(expr, envir, enclos)
7: stats::model.frame(formula = Surv(start, stop, event) ~ rx + number, data =
8: model.frame.default(formula = Surv(start, stop, event) ~ rx + number, data
9: is.data.frame(data)
Note that although we pass in newdata
, it is not used in construction of model frame:
3: stats::model.frame(object)
Only object
, a copy of fitted model, is passed to model.frame.default()
.
This is very different from what happens in predict.lm()
, predict.glm()
and mgcv:::predict.gam()
. In these routines, newdata
is passed to model.frame.default()
. For example, in lm()
, there is:
m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
I don't use survival
package, so not sure how newdata
works in this package. So I think we really need some expert explaining this.
model=T
to the original coxph call will return a coxph object that can then be used withsurvfit()
to get the desired result, esp. within a function where the data used in the original coxph call is not available as a global variable. – Goldy