predict() with arbitrary coefficients in r

Asked 6/9, 2014 at 0:6 Answered 12/9, 2024 at 16:7

I've got some coefficients for a logit model set by a non-r user. I'd like to import those coefficients into r and generate some goodness of fit estimates on the same dataset (ROC and confusion matrix) vs my own model. My first thought was to coerce the coefficients into an existing GLM object using something like

summary(fit)$coefficients[,1] <- y

summary(fit)$coefficients <- x

where y and x are matrices containing the coefficients I'm trying to use to predict and fit is a previously created dummy glm object fit to the dataset. Of course, this gives me only errors.

Is there any way to pass an arbitrary coefficient vector to the predict() function or to specify coefficients in a model? Can I somehow force this by passing a vector into the offset argument in GLM? Thanks

Edit: As mentioned in the comments, there's not much statistical basis for using the arbitrary coefficients. I have a business partner who believes he/she 'knows' the right coefficients and I'm trying to quantify the loss of predictive power based on those estimates versus the coefficients generated by a proper model.

Edit2: Per BondedDust's answer, I was able to coerce the coefficients, however wasn't able to clear the error messages that predict() returned due to the coercion, it would appear that predict.lm, which is called by predict, also looks at the rank of the coefficients and that is causing the error.

Chloral answered 6/9, 2014 at 0:6 Comment(5)

In response to this question I made a makeglm() function that sounds like it would be useful in this scenario. If you perhaps provided a more concrete sample, we might be able to try it out. – Microphyte 6/9, 2014 at 0:17

@Stencill Can you just calculate the predicted values manually by multiplying the relevant variables in your data by coefficients given to you? For example, coefVector %*% t(cbind(1, dataVariables)). (Where coefVector is the provided vector of coefficients and datavariables is your data for the relevant coefficients) – Befit 6/9, 2014 at 4:47

@BondedDust Sorry, was away from the pc for the weekend -- I'll keep to timely updates from now on. – Chloral 8/9, 2014 at 17:12

@Befit That would seem like the easiest solution. However, I would need to convert a number of factors into dummy variables. – Chloral 8/9, 2014 at 18:7

That's right - model.matrix would make this straight forward. – Befit 8/9, 2014 at 18:46

If you follow the code through predict.glm which passes the object to predict.lm, it appears that the node of the model list that needs to be altered is indeed fit$coefficients. However, altering the summary()-object will have no effect. The [['coefficients']] in the glm and lm objects are not matrices with columns: 'Estimate', 'Std. Error', 't value', 'Pr(>|t|)' such as produced by summary, but rather just a vector of coefficients.

 fit$coefficients <- y
 newpred <- predict(fit)

You might make a copy and work on it if you will need any further use of fit.

Transmontane answered 6/9, 2014 at 5:30 Comment(3)

This worked. Thanks a lot. It's a bit of a hack, since the rest of the glm object (Pvalues, etc.) is now mismatched, but I was able to pass this through predict() successfully. – Chloral 8/9, 2014 at 17:10

Scratch that, it seems I was passing fit incorrectly into predict. >pred <- predict(fit, newdata = sample1) Where fit is my glm object containing modified coefficients returns the following error: >Error in [.data.frame(beta, piv) : undefined columns selected In addition: Warning message: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading – Chloral 8/9, 2014 at 17:41

It was definitely a hack, but that was definitely what you asked for. The method you were using had no real statistical basis. I have no idea where you are in addressing this. You should have A) posted a data example in the first place, and B) updated your question using the edit process rather than posting a comment to my answer. – Transmontane 8/9, 2014 at 17:57

This is not an answer to your posted question - which BondedDust answered - but describes an alternate way in calculating the predicted probabilities yourself which might help in this case.

# Use the mtcars dataset for a minimum worked example
data(mtcars)

# Run a logistic regression and get predictions 
mod <- glm(vs ~ mpg + factor(gear) + factor(am), mtcars, family="binomial")
p1 <- predict(mod, type="response")

# Calculate predicted probabilities manually
m <- model.matrix(~ mpg + factor(gear) + factor(am), mtcars)[,]
p2 <- coef(mod) %*% t(m)
p2 <- plogis(p2)

all(p1 == p2)
#identical(as.numeric(p1), as.numeric(p2))

You can replace coef(mod) with the vector of coefficients given to you. model.matrix will generate the dummy variables required for the calculation - check that the ordering is the same as that of the coefficient vector.

Befit answered 8/9, 2014 at 19:20 Comment(4)

How would I change the above (i.e. plogis(p2)) if I wanted to predict based on a probit model (i.e. family = binomial(link = "probit") in the glm command)? – Beatific 15/9, 2017 at 23:41

@Beatific ; I'm not sure - what is the formula for the probit link? ( as plogis == 1/(1+e(-xb)) ). I think you will use the pnorm function (pnorm(p2)) but probably best to ask at stats.stackexchange.com/questions – Befit 15/9, 2017 at 23:54

@Beatific ; ok just ran through the example, changing the link to probit : family=binomial("probit") , and p2 <- pnorm(p2) , then all.equal(p1, as.numeric(p2), check.attributes = FALSE) , so seems pnorm is the way to go – Befit 15/9, 2017 at 23:58

@user20650: Thank you so much. I ran some tests too and can confirm that p2 <- pnorm(p2) is the correct way to do it. Thanks for your quick reply! – Beatific 15/9, 2017 at 23:59

Or, you can use something like this:

fit <- lm(Y ~ A + B + C, data=fakedata)

fit$coefficients <- c(1, 2, 3) # this would change the coefficients for A, B, C to 1, 2 and 3, respectively.

Y_hat_new <- predict(fit, new_fakedata) # this Y_hat_new will be calculated as your new predicted outcome given the new coefficients and/or new_fakedata.

The results should be the same if you follow the model.matrix route.

Margetmargette answered 22/6, 2016 at 16:42 Comment(0)

I've been looking to do something similar and this method worked for me.

coefficients <- predict(model, data=model$data, type="terms")
coefficients <- as.data.frame(coefficients) %>% rename_with(~paste0(., "_coef"))
data <- cbind(model$data, coefficients)

Quinidine answered 12/9, 2024 at 16:7 Comment(0)

Recommended topics

Hot tags