predict method for felm from lfe package

Asked 27/5, 2015 at 19:32 Answered 8/11, 2019 at 20:52

Does anyone have a nice clean way to get predict behavior for felm models?

library(lfe)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
predict(model1, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Does not work

Stila answered 27/5, 2015 at 19:32 Comment(4)

predict doesnt work because it creates felm class object and predict wont work for it – Mondrian 2/7, 2015 at 17:8

Just a note, you don't have to say data(iris), iris data is lazyloaded already. – Ursas 2/7, 2015 at 18:28

as for adding predict to include to felm create a request to r-proj-c > methods("predict") [1] predict.ar* predict.Arima* predict.arima0* [4] predict.glm predict.HoltWinters* predict.lm [7] predict.loess* predict.mlm* predict.nls* [10] predict.poly* predict.ppr* predict.prcomp* [13] predict.princomp* predict.smooth.spline* predict.smooth.spline.fit* [16] predict.StructTS* – Mondrian 2/7, 2015 at 19:49

I think quite a bit of re-engineering the felm() function (and the functions it calls) would be necessary as the current implementation does not store the fixed effect coefficients, or even apparently the intercept -- see this answer on a question that is at least a near duplicate of this one. – Unknowable 3/2, 2018 at 12:7

UPDATE (2020-04-02): The answer from Grant below using the new package fixest provides a more parsimonious solution.

As a workaround, you could combine felm, getfe, and demeanlist as follows:

library(lfe)

lm.model <- lm(data=demeanlist(iris[, 1:2], list(iris$Species)), Sepal.Length ~ Sepal.Width)
fe <- getfe(felm(data = iris, Sepal.Length ~ Sepal.Width | Species))
predict(lm.model, newdata = data.frame(Sepal.Width = 3)) + fe$effect[fe$idx=="virginica"]

The idea is that you use demeanlist to center the variables, then lm to estimate the coefficient on Sepal.Width using the centered variables, giving you an lm object over which you can run predict. Then run felm+getfe to get the conditional mean for the fixed effect, and add that to the output of predict.

Olivier answered 28/12, 2015 at 23:16 Comment(7)

How do you do this for multiple fe? – Ornie 3/2, 2018 at 4:0

You add the other FE to the demeanlist and getfe commands, then add another term onto the final sum. – Olivier 5/2, 2018 at 5:37

This answer should get more attention, getfe is a very useful command and it's obvious how to predict once you have that. Furthermore it seems to be the only answer that actually answers the question in a general, correct way – Ornie 5/2, 2018 at 18:39

Well, it's not as general as I would like. You couldn't use my code to construct standard errors on yhat, or the confidence or prediction interval. I don't know how to do that, so I posted a similar question to this one to see if anyone else had thoughts. #48634949 – Olivier 6/2, 2018 at 2:16

In defense of this answer, as I read his question, he doesn't ask for inference, just point estimates. But I agree with you that the distribution would be nice. Hopefully someone answers the other question.. – Ornie 6/2, 2018 at 15:30

In the predict line, wouldn't you want to use a demeaned value for Sepal.Width since the original input for the model is demeaned, and you're adding the FE at the end? – Netta 13/10, 2019 at 22:43

No, we want to use the original value, since the coefficients we estimate still represent the same thing they would have in the uncentered model. You can double check by running predict on the lm equivalent:

lm2 <- lm(data = iris, Sepal.Length ~ Sepal.Width +  factor(Species)) predict(lm2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))

– Olivier 15/10, 2019 at 5:2

Late to the party, but the new fixest package (link) has a predict method. It supports high-dimensional fixed effects (and clustering, etc.) using a very similar syntax to lfe. Somewhat remarkably, it is also considerably faster than lfe for the benchmark cases that I've tested.

library(fixest)

model_feols <- feols(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model_feols, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works

Jerold answered 8/11, 2019 at 20:52 Comment(0)

This might not be the answer that you are looking for, but it seems that the author did not add any functionality to the lfe package in order to make predictions on external data by using the fitted felm model. The primary focus seems to be on the analysis of the group fixed effects. However, it's interesting to note that in the documentation of the package the following is mentioned:

The object has some resemblance to an 'lm' object, and some postprocessing methods designed for lm may happen to work. It may however be necessary to coerce the object to succeed with this.

Hence, it might be possible to coerce the felm object to an lm object in order to obtain some additional lm functionality (if all the required info is present in the object to perform the necessary computations).

The lfe package is intended to be run on very large datasets and effort was made to conserve memory: As a direct result of this, the felm object does not use/contain a qr decomposition, as opposed to the lm object. Unfortunately, the lm predict procedure relies on this information in order to compute the predictions. Hence, coercing the felm object and executing the predict method will fail:

> model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
> class(model2) <- c("lm","felm") # coerce to lm object
> predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
 Rank zero or should not have used lm(.., qr=FALSE).

If you really must use this package to perform the predictions then you could maybe write your own simplified version of this functionality by using the information that you have available in the felm object. For example, the OLS regression coëfficients are available via model2$coefficients.

Stammer answered 5/7, 2015 at 14:5 Comment(0)

This should work for cases where you wish to ignore the group effects in the prediction, are predicting for new X's, and only want confidence intervals. It first looks for a clustervcv attribute, then robustvcv, then vcv.

predict.felm <- function(object, newdata, se.fit = FALSE,
                         interval = "none",
                         level = 0.95){
  if(missing(newdata)){
    stop("predict.felm requires newdata and predicts for all group effects = 0.")
  }

  tt <- terms(object)
  Terms <- delete.response(tt)
  attr(Terms, "intercept") <- 0

  m.mat <- model.matrix(Terms, data = newdata)
  m.coef <- as.numeric(object$coef)
  fit <- as.vector(m.mat %*% object$coef)
  fit <- data.frame(fit = fit)

  if(se.fit | interval != "none"){
    if(!is.null(object$clustervcv)){
      vcov_mat <- object$clustervcv
    } else if (!is.null(object$robustvcv)) {
      vcov_mat <- object$robustvcv
    } else if (!is.null(object$vcv)){
      vcov_mat <- object$vcv
    } else {
      stop("No vcv attached to felm object.")
    }
    se.fit_mat <- sqrt(diag(m.mat %*% vcov_mat %*% t(m.mat)))
  }
  if(interval == "confidence"){
    t_val <- qt((1 - level) / 2 + level, df = object$df.residual)
    fit$lwr <- fit$fit - t_val * se.fit_mat
    fit$upr <- fit$fit + t_val * se.fit_mat
  } else if (interval == "prediction"){
    stop("interval = \"prediction\" not yet implemented")
  }
  if(se.fit){
    return(list(fit=fit, se.fit=se.fit_mat))
  } else {
    return(fit)
  }
}

Stila answered 4/3, 2016 at 23:41 Comment(0)

To extend the answer from pbaylis, I created a slightly longwinded function that extends nicely to allow for more than one fixed effect. Note that you have to manually enter the original dataset used in the felm model. The function returns a list with two items: the vector of predictions, and a dataframe based on the new_data that includes the predictions and fixed effects as columns.

predict_felm <- function(model, data, new_data) {

  require(dplyr)

  # Get the names of all the variables
  y <- model$lhs
  x <- rownames(model$beta)
  fe <- names(model$fe)

  # Demean according to fixed effects
  data_demeaned <- demeanlist(data[c(y, x)],
                             as.list(data[fe]),
                             na.rm = T)

  # Create formula for LM and run prediction
  lm_formula <- as.formula(
    paste(y, "~", paste(x, collapse = "+"))
  )

  lm_model <- lm(lm_formula, data = data_demeaned)
  lm_predict <- predict(lm_model,
                        newdata = new_data)

  # Collect coefficients for fe
  fe_coeffs <- getfe(model) %>% 
    select(fixed_effect = effect, fe_type = fe, idx)

  # For each fixed effect, merge estimated fixed effect back into new_data
  new_data_merge <- new_data
  for (i in fe) {

    fe_i <- fe_coeffs %>% filter(fe_type == i)

    by_cols <- c("idx")
    names(by_cols) <- i

    new_data_merge <- left_join(new_data_merge, fe_i, by = by_cols) %>%
      select(-matches("^idx"))

  }

  if (length(lm_predict) != nrow(new_data_merge)) stop("unmatching number of rows")

  # Sum all the fixed effects
  all_fixed_effects <- base::rowSums(select(new_data_merge, matches("^fixed_effect")))

  # Create dataframe with predictions
  new_data_predict <- new_data_merge %>% 
    mutate(lm_predict = lm_predict, 
           felm_predict = all_fixed_effects + lm_predict)

  return(list(predict = new_data_predict$felm_predict,
              data = new_data_predict))

}

model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict_felm(model = model2, data = iris, new_data = data.frame(Sepal.Width = 3, Species = "virginica"))
# Returns prediction and data frame

Yonder answered 26/5, 2019 at 9:2 Comment(0)

-2

I think what you're looking for might be the lme4 package. I was able to get a predict to work using this:

library(lme4)
data(iris)

model2 <- lmer(data = iris, Sepal.Length ~ (Sepal.Width | Species))
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
       1 
6.610102

You may have to play around a little to specify the particular effects you're looking for, but the package is well-documented so it shouldn't be a problem.

Hartung answered 2/7, 2015 at 18:22 Comment(6)

This doesn't seem to replicate the example above and has results2 where it should have model2. – Stila 2/7, 2015 at 20:28

Fixed the results2 (typo). The difference I'm seeing between the two answers is .001, which could easily just come from slight differences between how the two models are implemented. – Hartung 2/7, 2015 at 23:59

Still doesn't seem to be working on my machine. I get this error Error: sum(nb) == q is not TRUE – Stila 3/7, 2015 at 2:45

I updated with the complete code (loading in the library and data), and it works on both my Mac and PC. I'm using R 3.1.1 on my Mac. I'm not sure why it's not working for you - my original thought would be that it's due to NA, but we're only predicting on one observation so that shouldn't be a problem. – Hartung 3/7, 2015 at 14:32

Yeah it's bizarre. I'm using 3.2.0 and also tried it on 3.1.3. Regardless, your answer doesn't answer the question, which specifically asked for the predict method for felm models. – Stila 3/7, 2015 at 19:51

lmer implements RANDOM effects. lfe implements fixed effects. fixed effects are not shrunken, because the goal is typically inference about marginal effects, rather than prediction. If you want to fit a fixed effects model, don't use lmer. – Cano 30/12, 2015 at 20:48

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags