Create formula call from character string
Asked Answered
H

1

7

I use a best subset selection package to determine the best independent variables from which to build my model (I do have a specific reason for doing this instead of using the best subset object directly). I want to programmatically extract the feature names and use the resulting string to build my model formula. The result would be something like this:

x <- "x1 + x2 + x3"
y <- "Surv(time, event)"

Because I'm building a coxph model, the formula is as follows:

coxph(Surv(time, event) ~ x1 + x2 + x3)

Using these string fields, I tried to construct the formula like so:

form <- y ~ x

This creates an object of class formula but when I call coxph it doesn't evaluate based on the references created form the formula object. I get the following error:

Error in model.frame.default(formula = y ~ x) : object is not a matrix

If I call eval on the objects y and x within the coxph call, I get the following:

Error in model.frame.default(formula = eval(y) ~ eval(x), data = df) : 

variable lengths differ (found for 'eval(x)')

I'm not really sure how to proceed. Thanks for your input.

Handmedown answered 19/9, 2018 at 19:30 Comment(2)
as.formula(paste(y, "~", x))Unfolded
reformulate(x, y) output # Surv(time, event) ~ x1 + x2 + x3Mechanism
U
7

Couldn't find a good dupe, so posting comment as an answer.

If you build the full formula as a string, including the ~, you can use as.formula on it, e.g.,

x = "x1 + x2 + x3"
y = "Surv(time, event)"
form = as.formula(paste(y, "~", x))
coxph(form, data = your_data)

For a reproducible example, consider the first example at the bottom of the ?coxph help page:

library(survival)
test1 <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,2,1,1,1,0,0), 
              sex=c(0,0,0,0,1,1,1)) 
# Fit a stratified model 
coxph(Surv(time, status) ~ x + strata(sex), test1)
# Call:
# coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1)
# 
#    coef exp(coef) se(coef)    z    p
# x 0.802     2.231    0.822 0.98 0.33
# 
# Likelihood ratio test=1.09  on 1 df, p=0.3
# n= 7, number of events= 5 

lhs = "Surv(time, status)"
rhs = "x + strata(sex)"
form = as.formula(paste(lhs, "~", rhs))
form
# Surv(time, status) ~ x + strata(sex)
## formula looks good

coxph(form, test1)
# Call:
# coxph(formula = form, data = test1)
# 
#    coef exp(coef) se(coef)    z    p
# x 0.802     2.231    0.822 0.98 0.33

Same results either way.

Unfolded answered 19/9, 2018 at 19:39 Comment(3)
Thanks @Gregor! Not sure how I missed that when trying different options. Gracias!Handmedown
@ToddShannon I added a reproducible example demonstrating it works. If you're getting an error, make sure (a) are your variables are spelled right, (b) they all exist in the data frame you're passing in, and (c) the formula object you create looks right. If it still doesn't work, be more specific about the error you're getting and make a reproducible example.Unfolded
Aaaahhhhh @Gregor I found the issue.. For some reason when assigning the lhs I used quote instead of putting the string in actual quotes, causing the object to be of class call instead of character. Thanks once again good sir!Handmedown

© 2022 - 2024 — McMap. All rights reserved.