Effects from multinomial logistic model in mlogit
Asked Answered
W

2

6

I received some good help getting my data formatted properly produce a multinomial logistic model with mlogit here (Formatting data for mlogit)

However, I'm trying now to analyze the effects of covariates in my model. I find the help file in mlogit.effects() to be not very informative. One of the problems is that the model appears to produce a lot of rows of NAs (see below, index(mod1) ).

  1. Can anyone clarify why my data is producing those NAs?
  2. Can anyone help me get mlogit.effects to work with the data below?
  3. I would consider shifting the analysis to multinom(). However, I can't figure out how to format the data to fit the formula for use multinom(). My data is a series of rankings of seven different items (Accessible, Information, Trade offs, Debate, Social and Responsive) Would I just model whatever they picked as their first rank and ignore what they chose in other ranks? I can get that information.

Reproducible code is below:

#Loadpackages 
library(RCurl)
library(mlogit)
library(tidyr)
library(dplyr)
#URL where data is stored
dat.url <- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv'

#Get data
dat <- read.csv(dat.url)

#Complete cases only as it seems mlogit cannot handle missing values or tied data which in this case you might get because of median imputation
dat <- dat[complete.cases(dat),]

#Change the choice index variable (X) to have no interruptions, as a result of removing some incomplete cases
dat$X <- seq(1,nrow(dat),1)

#Tidy data to get it into long format
dat.out <- dat %>%
  gather(Open, Rank, -c(1,9:12)) %>%
  arrange(X, Open, Rank)

#Create mlogit object
mlogit.out <- mlogit.data(dat.out, shape='long',alt.var='Open',choice='Rank', ranked=TRUE,chid.var='X')

#Fit Model
mod1 <- mlogit(Rank~1|gender+age+economic+Job,data=mlogit.out)

Here is my attempt to set up a data frame similar to the one portrayed in the help file. It doesnt work. I confess although I know the apply family pretty well, tapply is murky to me.

with(mlogit.out, data.frame(economic=tapply(economic, index(mod1)$alt, mean)))

Compare from the help:

data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
m <- mlogit(mode ~ price | income | catch, data = Fish)

# compute a data.frame containing the mean value of the covariates in
# the sample data in the help file for effects
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean),
                       catch = tapply(catch, index(m)$alt, mean),
                       income = mean(income)))

# compute the marginal effects (the second one is an elasticity
effects(m, covariate = "income", data = z)
Whitleather answered 16/6, 2015 at 19:0 Comment(10)
When I run your code up to mod1 <- mlogit(... it works fine. When I look at summary(mod1) it looks good. Looking at ?index, the help page points to mlogit.data, which sounds like it is intended for use on data, not on a model, the description is: "shape a data.frame in a suitable form for the use of the mlogit function." Nor do I see index used on a model in the help. Maybe you need to update your mlogit package?Diametrically
...though it looks like mlogit hasn't been updated since December 2013, so that's probably not your problem. The only index object I find in my namespace is from the zoo package. So, if you don't use index on your model (use summary() instead), do you still have a question?Diametrically
Gregor, this line in the help example uses the index() command on the model: z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean), catch = tapply(catch, index(m)$alt, mean), income = mean(income)))Whitleather
But none of this changes the fact that I can't get effects() to work on an mlogit model.Whitleather
But in the help example Fish is a data frame, not a model!Diametrically
Hi Gregor, yes, Fish is a data frame in the help file. But m is the model. It is used to construct a sample data frame to provide to effects. See: with(Fish, data.frame( price=tapply(price, index(m)$alt,mean) ) ) But to be honest, all I really need is help getting the effects and / or predict function to work with the data and model with the data above. However it happens.Whitleather
It looks like not all of your respondents ranked every choice, which could be why you get the NA in the index(mod1) code. Did you try using effects on the Game model from the mlogit vignette to see if you encounter the same issue on another ranked order model?Applied
@Applied I thought that this line would deal with this: code dat<-dat[complete.cases(dat),] code It seems like I get a similar error with the Game data: code library(mlogit) #Load data data('Game2') #format game.dat<-mlogit.data(Game2, choice='ch', shape='long', alt.var='platform', ranked=TRUE) #Model game.mod<-mlogit(ch ~ own|hours, data=Game2, alt.var='platform', ranked=TRUE, shape='long', reflevel='PC') summary(game.mod) index(game.mod) #Sample data sample.dat<-expand.grid(platform=levels(dat$platform), hours=2, own=c(1)) #effects effects(mod, covariate='hours', data=test) codeWhitleather
Yes, I saw you removed the missing values, but what I was wondering is if these models can handle the unbalanced result of doing so. But if you get the same problem with the Game data then it seems like something else is going on - have you considered contacting the package author/maintainer about using effects with rank-ordered models?Applied
@aosmith, yes several times, but no response. I also added in a line of code to make the choice index variable sequential again after removing the incomplete cases. But the missing values seem to appear again.Whitleather
C
0

You are working with Ranked Data, not just Multinomial Choice Data. The structure for the Ranked data in mlogit is that first set of records for a person are all options, then the second is all options except the one ranked first, and so on. But the index assumes equal number of options each time. So a bunch of NAs. We just need to get rid of them.

> with(mlogit.out, data.frame(economic=tapply(economic, index(mod1)$alt[complete.cases(index(mod1)$alt)], mean)))
            economic
Accessible      5.13
Debate          4.97
Information     5.08
Officials       4.92
Responsive      5.09
Social          4.91
Trade.Offs      4.91
Councilman answered 23/6, 2016 at 21:43 Comment(0)
T
2

I'll try Option 3 and switch to multinom(). This code will model the log-odds of ranking an item as 1st, compared to a reference item (e.g., "Debate" in the code below). With K = 7 items, if we call the reference item ItemK, then we're modeling

    log[ Pr(Itemk is 1st) / Pr(ItemK is 1st) ] = αk + xTβk

for k = 1,...,K-1, where Itemk is one of the other (i.e. non-reference) items. The choice of reference level will affect the coefficients and their interpretation, but it will not affect the predicted probabilities. (Same story for reference levels for the categorical predictor variables.)

I'll also mention that I'm handling missing data a bit differently here than in your original code. Since my model only needs to know which item gets ranked 1st, I only need to throw out records where that info is missing. (E.g., in the original dataset record #43 has "Information" ranked 1st, so we can use this record even though 3 other items are NA.)

# Get data
dat.url <- 'https://raw.githubusercontent.com/sjkiss/Survey/master/mlogit.out.csv'
dat <- read.csv(dat.url)

# dataframe showing which item is ranked #1
ranks <- (dat[,2:8] == 1)

# for each combination of predictor variable values, count
# how many times each item was ranked #1
dat2 <- aggregate(ranks, by=dat[,9:12], sum, na.rm=TRUE)

# remove cases that didn't rank anything as #1 (due to NAs in original data)
dat3 <- dat2[rowSums(dat2[,5:11])>0,]

# (optional) set the reference levels for the categorical predictors
dat3$gender <- relevel(dat3$gender, ref="Female")
dat3$Job <- relevel(dat3$Job, ref="Government backbencher")

# response matrix in format needed for multinom()
response <- as.matrix(dat3[,5:11])

# (optional) set the reference level for the response by changing
# the column order
ref <- "Debate"
ref.index <- match(ref, colnames(response))
response <- response[,c(ref.index,(1:ncol(response))[-ref.index])]

# fit model (note that age & economic are continuous, while gender &
# Job are categorical)
library(nnet)
fit1 <- multinom(response ~ economic + gender + age + Job, data=dat3)

# print some results
summary(fit1)
coef(fit1)
cbind(dat3[,1:4], round(fitted(fit1),3)) # predicted probabilities

I didn't do any diagnostics, so I make no claim that the model used here provides a good fit.

Tacklind answered 21/6, 2015 at 2:25 Comment(2)
OK, thanks @dagremu. This seems to be the best approach. I had hoped to find a way to use the information lower down in the rankings, but this will do!Whitleather
I agree it's not appealing to waste all the other rankings. An alternative coming to mind is to repeat the above procedure for each rank. That is, you already modeled the log-odds of ranking an item #1. Then you would separately model the log-odds of ranking an item #2, and so on. I think you'd only need to edit the 3rd line of code above, setting ranks, changing the 1 on the right to 2, then 3, etc. Obviously it's not ideal to model all these things separately since there's information shared between the models. Whether it's worth it or not depends on what your scientific question is.Tacklind
C
0

You are working with Ranked Data, not just Multinomial Choice Data. The structure for the Ranked data in mlogit is that first set of records for a person are all options, then the second is all options except the one ranked first, and so on. But the index assumes equal number of options each time. So a bunch of NAs. We just need to get rid of them.

> with(mlogit.out, data.frame(economic=tapply(economic, index(mod1)$alt[complete.cases(index(mod1)$alt)], mean)))
            economic
Accessible      5.13
Debate          4.97
Information     5.08
Officials       4.92
Responsive      5.09
Social          4.91
Trade.Offs      4.91
Councilman answered 23/6, 2016 at 21:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.