predict method for felm from lfe package
Asked Answered
S

6

26

Does anyone have a nice clean way to get predict behavior for felm models?

library(lfe)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
predict(model1, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Does not work
Stila answered 27/5, 2015 at 19:32 Comment(4)
predict doesnt work because it creates felm class object and predict wont work for itMondrian
Just a note, you don't have to say data(iris), iris data is lazyloaded already.Ursas
as for adding predict to include to felm create a request to r-proj-c > methods("predict") [1] predict.ar* predict.Arima* predict.arima0* [4] predict.glm predict.HoltWinters* predict.lm [7] predict.loess* predict.mlm* predict.nls* [10] predict.poly* predict.ppr* predict.prcomp* [13] predict.princomp* predict.smooth.spline* predict.smooth.spline.fit* [16] predict.StructTS*Mondrian
I think quite a bit of re-engineering the felm() function (and the functions it calls) would be necessary as the current implementation does not store the fixed effect coefficients, or even apparently the intercept -- see this answer on a question that is at least a near duplicate of this one.Unknowable
O
16

UPDATE (2020-04-02): The answer from Grant below using the new package fixest provides a more parsimonious solution.

As a workaround, you could combine felm, getfe, and demeanlist as follows:

library(lfe)

lm.model <- lm(data=demeanlist(iris[, 1:2], list(iris$Species)), Sepal.Length ~ Sepal.Width)
fe <- getfe(felm(data = iris, Sepal.Length ~ Sepal.Width | Species))
predict(lm.model, newdata = data.frame(Sepal.Width = 3)) + fe$effect[fe$idx=="virginica"]

The idea is that you use demeanlist to center the variables, then lm to estimate the coefficient on Sepal.Width using the centered variables, giving you an lm object over which you can run predict. Then run felm+getfe to get the conditional mean for the fixed effect, and add that to the output of predict.

Olivier answered 28/12, 2015 at 23:16 Comment(7)
How do you do this for multiple fe?Ornie
You add the other FE to the demeanlist and getfe commands, then add another term onto the final sum.Olivier
This answer should get more attention, getfe is a very useful command and it's obvious how to predict once you have that. Furthermore it seems to be the only answer that actually answers the question in a general, correct wayOrnie
Well, it's not as general as I would like. You couldn't use my code to construct standard errors on yhat, or the confidence or prediction interval. I don't know how to do that, so I posted a similar question to this one to see if anyone else had thoughts. #48634949Olivier
In defense of this answer, as I read his question, he doesn't ask for inference, just point estimates. But I agree with you that the distribution would be nice. Hopefully someone answers the other question..Ornie
In the predict line, wouldn't you want to use a demeaned value for Sepal.Width since the original input for the model is demeaned, and you're adding the FE at the end?Netta
No, we want to use the original value, since the coefficients we estimate still represent the same thing they would have in the uncentered model. You can double check by running predict on the lm equivalent: lm2 <- lm(data = iris, Sepal.Length ~ Sepal.Width + factor(Species)) predict(lm2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))Olivier
J
11

Late to the party, but the new fixest package (link) has a predict method. It supports high-dimensional fixed effects (and clustering, etc.) using a very similar syntax to lfe. Somewhat remarkably, it is also considerably faster than lfe for the benchmark cases that I've tested.

library(fixest)

model_feols <- feols(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model_feols, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
Jerold answered 8/11, 2019 at 20:52 Comment(0)
S
6

This might not be the answer that you are looking for, but it seems that the author did not add any functionality to the lfe package in order to make predictions on external data by using the fitted felm model. The primary focus seems to be on the analysis of the group fixed effects. However, it's interesting to note that in the documentation of the package the following is mentioned:

The object has some resemblance to an 'lm' object, and some postprocessing methods designed for lm may happen to work. It may however be necessary to coerce the object to succeed with this.

Hence, it might be possible to coerce the felm object to an lm object in order to obtain some additional lm functionality (if all the required info is present in the object to perform the necessary computations).

The lfe package is intended to be run on very large datasets and effort was made to conserve memory: As a direct result of this, the felm object does not use/contain a qr decomposition, as opposed to the lm object. Unfortunately, the lm predict procedure relies on this information in order to compute the predictions. Hence, coercing the felm object and executing the predict method will fail:

> model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
> class(model2) <- c("lm","felm") # coerce to lm object
> predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
 Rank zero or should not have used lm(.., qr=FALSE).

If you really must use this package to perform the predictions then you could maybe write your own simplified version of this functionality by using the information that you have available in the felm object. For example, the OLS regression coëfficients are available via model2$coefficients.

Stammer answered 5/7, 2015 at 14:5 Comment(0)
S
4

This should work for cases where you wish to ignore the group effects in the prediction, are predicting for new X's, and only want confidence intervals. It first looks for a clustervcv attribute, then robustvcv, then vcv.

predict.felm <- function(object, newdata, se.fit = FALSE,
                         interval = "none",
                         level = 0.95){
  if(missing(newdata)){
    stop("predict.felm requires newdata and predicts for all group effects = 0.")
  }

  tt <- terms(object)
  Terms <- delete.response(tt)
  attr(Terms, "intercept") <- 0

  m.mat <- model.matrix(Terms, data = newdata)
  m.coef <- as.numeric(object$coef)
  fit <- as.vector(m.mat %*% object$coef)
  fit <- data.frame(fit = fit)

  if(se.fit | interval != "none"){
    if(!is.null(object$clustervcv)){
      vcov_mat <- object$clustervcv
    } else if (!is.null(object$robustvcv)) {
      vcov_mat <- object$robustvcv
    } else if (!is.null(object$vcv)){
      vcov_mat <- object$vcv
    } else {
      stop("No vcv attached to felm object.")
    }
    se.fit_mat <- sqrt(diag(m.mat %*% vcov_mat %*% t(m.mat)))
  }
  if(interval == "confidence"){
    t_val <- qt((1 - level) / 2 + level, df = object$df.residual)
    fit$lwr <- fit$fit - t_val * se.fit_mat
    fit$upr <- fit$fit + t_val * se.fit_mat
  } else if (interval == "prediction"){
    stop("interval = \"prediction\" not yet implemented")
  }
  if(se.fit){
    return(list(fit=fit, se.fit=se.fit_mat))
  } else {
    return(fit)
  }
}
Stila answered 4/3, 2016 at 23:41 Comment(0)
Y
3

To extend the answer from pbaylis, I created a slightly longwinded function that extends nicely to allow for more than one fixed effect. Note that you have to manually enter the original dataset used in the felm model. The function returns a list with two items: the vector of predictions, and a dataframe based on the new_data that includes the predictions and fixed effects as columns.

predict_felm <- function(model, data, new_data) {

  require(dplyr)

  # Get the names of all the variables
  y <- model$lhs
  x <- rownames(model$beta)
  fe <- names(model$fe)

  # Demean according to fixed effects
  data_demeaned <- demeanlist(data[c(y, x)],
                             as.list(data[fe]),
                             na.rm = T)

  # Create formula for LM and run prediction
  lm_formula <- as.formula(
    paste(y, "~", paste(x, collapse = "+"))
  )

  lm_model <- lm(lm_formula, data = data_demeaned)
  lm_predict <- predict(lm_model,
                        newdata = new_data)

  # Collect coefficients for fe
  fe_coeffs <- getfe(model) %>% 
    select(fixed_effect = effect, fe_type = fe, idx)

  # For each fixed effect, merge estimated fixed effect back into new_data
  new_data_merge <- new_data
  for (i in fe) {

    fe_i <- fe_coeffs %>% filter(fe_type == i)

    by_cols <- c("idx")
    names(by_cols) <- i

    new_data_merge <- left_join(new_data_merge, fe_i, by = by_cols) %>%
      select(-matches("^idx"))

  }

  if (length(lm_predict) != nrow(new_data_merge)) stop("unmatching number of rows")

  # Sum all the fixed effects
  all_fixed_effects <- base::rowSums(select(new_data_merge, matches("^fixed_effect")))

  # Create dataframe with predictions
  new_data_predict <- new_data_merge %>% 
    mutate(lm_predict = lm_predict, 
           felm_predict = all_fixed_effects + lm_predict)

  return(list(predict = new_data_predict$felm_predict,
              data = new_data_predict))

}

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict_felm(model = model2, data = iris, new_data = data.frame(Sepal.Width = 3, Species = "virginica"))
# Returns prediction and data frame
Yonder answered 26/5, 2019 at 9:2 Comment(0)
H
-2

I think what you're looking for might be the lme4 package. I was able to get a predict to work using this:

library(lme4)
data(iris)

model2 <- lmer(data = iris, Sepal.Length ~ (Sepal.Width | Species))
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
       1 
6.610102 

You may have to play around a little to specify the particular effects you're looking for, but the package is well-documented so it shouldn't be a problem.

Hartung answered 2/7, 2015 at 18:22 Comment(6)
This doesn't seem to replicate the example above and has results2 where it should have model2.Stila
Fixed the results2 (typo). The difference I'm seeing between the two answers is .001, which could easily just come from slight differences between how the two models are implemented.Hartung
Still doesn't seem to be working on my machine. I get this error Error: sum(nb) == q is not TRUEStila
I updated with the complete code (loading in the library and data), and it works on both my Mac and PC. I'm using R 3.1.1 on my Mac. I'm not sure why it's not working for you - my original thought would be that it's due to NA, but we're only predicting on one observation so that shouldn't be a problem.Hartung
Yeah it's bizarre. I'm using 3.2.0 and also tried it on 3.1.3. Regardless, your answer doesn't answer the question, which specifically asked for the predict method for felm models.Stila
lmer implements RANDOM effects. lfe implements fixed effects. fixed effects are not shrunken, because the goal is typically inference about marginal effects, rather than prediction. If you want to fit a fixed effects model, don't use lmer.Cano

© 2022 - 2024 — McMap. All rights reserved.