Output each factor level as dummy variable in stargazer summary statistics table
Asked Answered
J

4

6

I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example:

> library(car)
> library(stargazer)
> data(Blackmore)
> stargazer(Blackmore[, c("age", "exercise", "group")], type = "text")

==========================================
Statistic  N   Mean  St. Dev.  Min   Max  
------------------------------------------
age       945 11.442  2.766   8.000 17.920
exercise  945 2.531   3.495   0.000 29.960
------------------------------------------

But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is?

Edit: car::Blackmoor has updated spelling to car::Blackmore.

Jarvis answered 13/11, 2014 at 15:55 Comment(3)
Stargazer can't do this automatically. See this question too: #25474189Accouterment
But you could build your own summary table and then use pander or xtable to convert it to Markdown, Word, LaTeX, HTML, or whatever else you want.Accouterment
Thanks. It's too bad they don't have an option for this yet. Your workaround is close to what I was looking for, but I wanted to % in the control condition and % in the patient condition. I'll post my workaround, too.Jarvis
A
5

Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:

library(dplyr)
library(tidyr)

fancy.summary <- Blackmoor %>%
  select(-subject) %>%  # Remove the subject column
  group_by(group) %>%  # Group by patient and control
  summarise_each(funs(mean, sd, min, max, length)) %>%  # Calculate summary statistics for each group
  mutate(prop = age_length / sum(age_length)) %>%  # Calculate proportion
  gather(variable, value, -group, -prop) %>%  # Convert to long
  separate(variable, c("variable", "statistic")) %>%  # Split variable column
  mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
  spread(statistic, value) %>%  # Make the statistics be actual columns
  select(group, variable, n, mean, sd, min, max, prop)  # Reorder columns

Which results in this if you use pander:

library(pander)

pandoc.table(fancy.summary)

------------------------------------------------------
 group   variable   n   mean   sd    min   max   prop 
------- ---------- --- ------ ----- ----- ----- ------
control    age     359 11.26  2.698   8   17.92 0.3799

control  exercise  359 1.641  1.813   0   11.54 0.3799

patient    age     586 11.55  2.802   8   17.92 0.6201

patient  exercise  586 3.076  4.113   0   29.96 0.6201
------------------------------------------------------
Accouterment answered 13/11, 2014 at 16:38 Comment(0)
J
2

Another workaround is to use model.matrix to create dummy variables in a separate step, and then use stargazer to create a table from that. To show this with the example:

> library(car)
> library(stargazer)
> data(Blackmore)
> 
> options(na.action = "na.pass")  # so that we keep missing values in the data
> X <- model.matrix(~ age + exercise + group - 1, data = Blackmore)
> X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
> names(X) <- colnames(X)
> stargazer(X.df, type = "text")

=============================================
Statistic     N   Mean  St. Dev.  Min   Max  
---------------------------------------------
age          945 11.442  2.766   8.000 17.920
exercise     945 2.531   3.495   0.000 29.960
groupcontrol 945 0.380   0.486     0     1   
grouppatient 945 0.620   0.486     0     1   
---------------------------------------------

Edit: car::Blackmoor has updated spelling to car::Blackmore.

Jarvis answered 14/11, 2014 at 17:7 Comment(3)
Your answer is still helping people 6 years later! I wonder if you can tell me why we use the line names(X) <- colnames(X)?Vagarious
Also, when I use your method to create a summary table with more than one factor variable, e.g. more than just group, the other factor variables seem to exclude the base-case--which is great for avoiding the dummy variable trap, but it's not so good for producing the summary table I'm trying to get. I wonder if you have advice? Should I open a new question?Vagarious
I've opened up a new question here: #62315573Vagarious
M
1

The package tables can be useful for this task.

library(car)
library(tables)
data(Blackmore)

# percent only:
(x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4), 
    data=Blackmore))
##              
##         Pct  
## control 37.99
## patient 62.01

# percent and counts:
(x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4), 
    data=Blackmore))
##                      
##         n      Pct   
## control 359.00  37.99
## patient 586.00  62.01

Then it's straightforward to output this to LaTeX:

> latex(x)
\begin{tabular}{lcc}
\hline
  & n & \multicolumn{1}{c}{Pct} \\ 
\hline
control  & $359.00$ & $\phantom{0}37.99$ \\
patient  & $586.00$ & $\phantom{0}62.01$ \\
\hline 
\end{tabular}
Montserrat answered 10/7, 2015 at 17:9 Comment(0)
W
0

This has been a struggle for me. I like how Stargazer looks but do not like how it does not produce factor variable summary statistics at each level. This worked for me, hopefully it saves someone headaches in the future.

You have to create dummy variables quickly to do this. I use the fastDummies package. And then you will also have to create two lists of columns for those variables that are factors, and those which are not.

library('stargazer')
library('fastDummies')

factor_cols <- c("x", "y", "z")
nonfactor_cols <- c("u", "v")
df <- dummy_cols(df[, c(factorcols, nonfactor_cols)])
df <- df[, !names(df) %in% factor_cols]        # This will remove the duplicate columns that were created.
stargazer(df, 
          type = "html",
          out = "summary.htm")

Note that the variable labels become messed up in the final output. But I usually change covariate names manually at the end anyway, so it is fine.

Whitewood answered 8/6, 2021 at 23:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.