How to run a multinomial logit regression with both individual and time fixed effects in R
Asked Answered
R

0

12

Long story short:

I need to run a multinomial logit regression with both individual and time fixed effects in R. I thought I could use the packages mlogit and survival to this purpose, but I am cannot find a way to include fixed effects.

Now the long story:

I have found many questions on this topic on various stack-related websites, none of them were able to provide an answer. Also, I have noticed a lot of confusion regarding what a multinomial logit regression with fixed effects is (people use different names) and about the R packages implementing this function. So I think it would be beneficial to provide some background before getting to the point.

Consider the following. In a multiple choice question, each respondent take one choice. Respondents are asked the same question every year. There is no apriori on the extent to which choice at time t is affected by the choice at t-1. Now imagine to have a panel data recording these choices. The data, would look like this:

set.seed(123)
# number of observations
n <- 100
# number of possible choice
possible_choice <- letters[1:4]
# number of years
years <- 3
# individual characteristics
x1 <- runif(n * 3, 5.0, 70.5)
x2 <- sample(1:n^2, n * 3, replace = F)
# actual choice at time 1
actual_choice_year_1 <- possible_choice[sample(1:4, n, replace = T, prob = rep(1/4, 4))]
actual_choice_year_2 <- possible_choice[sample(1:4, n, replace = T, prob = c(0.4, 0.3, 0.2, 0.1))]
actual_choice_year_3 <- possible_choice[sample(1:4, n, replace = T, prob = c(0.2, 0.5, 0.2, 0.1))]
# create long dataset
df <- data.frame(choice = c(actual_choice_year_1, actual_choice_year_2, actual_choice_year_3),
           x1 = x1, x2 = x2, 
           individual_fixed_effect = as.character(rep(1:n, years)),
           time_fixed_effect = as.character(rep(1:years, each = n)),
           stringsAsFactors = F)

I am new to this kind of analysis. But if I understand correctly, if I want to estimate the effects of respondents' characteristics on their choice, I may use a multinomial logit regression.

In order to take advantage of the longitudinal structure of the data, I want to include in my specification individual and time fixed effects.

To the best of my knowledge, the multinomial logit regression with fixed effects was first proposed by Chamberlain (1980, Review of Economic Studies 47: 225–238). Recently, Stata users have been provided with the routines to implement this model (femlogit).

In the vignette of the femlogit package, the author refers to the R function clogit, in the survival package.

According to the help page, clogit requires data to be rearranged in a different format:

library(mlogit)
# create wide dataset
data_mlogit <- mlogit.data(df, id.var = "individual_fixed_effect", 
            group.var = "time_fixed_effect", 
            choice = "choice", 
            shape = "wide")

Now, if I understand correctly how clogit works, fixed effects can be passed through the function strata (see for additional details this tutorial). However, I am afraid that it is not clear to me how to use this function, as no coefficient values are returned for the individual characteristic variables (i.e. I get only NAs).

library(survival)
fit <- clogit(formula("choice ~ alt + x1 + x2 + strata(individual_fixed_effect, time_fixed_effect)"), as.data.frame(data_mlogit))
summary(fit)

Since I was not able to find a reason for this (there must be something that I am missing on the way these functions are estimated), I have looked for a solution using other packages in R: e.g., glmnet, VGAM, nnet, globaltest, and mlogit.

Only the latter seems to be able to explicitly deal with panel structures using appropriate estimation strategy. For this reason, I have decided to give it a try. However, I was only able to run a multinomial logit regression without fixed effects.

# state formula
formula_mlogit <- formula("choice ~ 1| x1 + x2")

# run multinomial regression
fit <- mlogit(formula_mlogit, data_mlogit)
summary(fit)

If I understand correctly how mlogit works, here's what I have done.

By using the function mlogit.data, I have created a dataset compatible with the function mlogit. Here, I have also specified the id of each individual (id.var = individual_fixed_effect) and the group to which individuals belongs to (group.var = "time_fixed_effect"). In my case, the group represents the observations registered in the same year.

My formula specifies that there are no variables correlated with a specific choice, and which are randomly distributed among individuals (i.e., the variables before the |). By contrast, choices are only motivated by individual characteristics (i.e., x1 and x2).

In the help of the function mlogit, it is specified that one can use the argument panel to use panel techniques. To set panel = TRUE is what I am after here.

The problem is that panel can be set to TRUE only if another argument of mlogit, i.e. rpar, is not NULL.

The argument rpar is used to specify the distribution of the random variables: i.e. the variables before the |. The problem is that, since these variables does not exist in my case, I can't use the argument rpar and then set panel = TRUE.

An interesting question related to this is here. A few suggestions were given, and one seems to go in my direction. Unfortunately, no examples that I can replicate are provided, and I do not understand how to follow this strategy to solve my problem.

Moreover, I am not particularly interested in using mlogit, any efficient way to perform this task would be fine for me (e.g., I am ok with survival or other packages).

Do you know any solution to this problem?

Two caveats for those interested in answering:

  1. I am interested in fixed effects, not in random effects. However, if you believe there is no other way to take advantage of the longitudinal structure of my data in R (there is indeed in Stata but I don't want to use it), please feel free to share your code.
  2. I am not interested in going Bayesian. So if possible, please do not suggest this approach.
Rotate answered 21/1, 2019 at 0:6 Comment(3)
Did you find an R alternative to Stata's femlogit?Sliding
No. Apparently femlogit is the only way to go at the moment.Rotate
I used the hybrid method (also known as the between-within method) from Allison's 2009 textbook on fixed effect regressions. Basically, you include both mean and demeaned values (by individual) and draw inferences from the demeaned variable coefficients.Sliding

© 2022 - 2024 — McMap. All rights reserved.