I am trying to create a simple summary statistics table (min, max, mean, n, etc) that handles both factor variables and continuous variables, even when there is more than one factor variable. I'm trying to produce good looking HTML output, eg stargazer
or huxtable
output.
For a simple reproducible example, I'll use mtcars
but change two of the variables to factors, and simplify to three variables.
library(tidyverse)
library(stargazer)
mtcars_df <- mtcars
mtcars_df <- mtcars_df %>%
mutate(vs = factor(vs),
am = factor(am)) %>%
select(mpg, vs, am)
head(mtcars_df)
So the data has two factor variables, vs
and am
. mpg
is left as a double:
#> mpg vs am
#> <dbl> <fctr> <fctr>
#> 1 21.0 0 1
#> 2 21.0 0 1
#> 3 22.8 1 1
#> 4 21.4 1 0
#> 5 18.7 0 0
#> 6 18.1 1 0
My desired output would look something like this (format only, the numbers aren't all correct for am0
):
======================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg 32 20.091 6.027 10 15.4 22.8 34
vs0 32 0.562 0.504 0 0 1 1
vs1 32 0.438 0.504 0 0 1 1
am0 32 0.594 0.499 0 0 1 1
am1 32 0.406 0.499 0 0 1 1
------------------------------------------------------
A straight call to stargazer
does not handle factors (but we have a solution for summarising one factor, below)
# this doesn't give factors
stargazer(mtcars_df, type = "text")
======================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg 32 20.091 6.027 10 15.4 22.8 34
------------------------------------------------------
This previous answer from @jake-fisher works very well to summarise one factor variable. https://mcmap.net/q/1708362/-output-each-factor-level-as-dummy-variable-in-stargazer-summary-statistics-table
The code below from the previous answer gives both values of the first factor vs
, i.e. vs0
and vs1
but when it comes to the second factor, am
, it only lists summary statistics for one value of am
:
am0
is missing.
I do realise that this is because we want to avoid the dummy variable trap when modeling, but my issue is not about modeling, it's about creating a summary table with all values of all factor variables.
options(na.action = "na.pass") # so that we keep missing values in the data
X <- model.matrix(~ . - 1, data = mtcars_df)
X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects
#names(X) <- colnames(X)
stargazer(X.df, type = "text")
======================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
------------------------------------------------------
mpg 32 20.091 6.027 10 15.4 22.8 34
vs0 32 0.562 0.504 0 0 1 1
vs1 32 0.438 0.504 0 0 1 1
am1 32 0.406 0.499 0 0 1 1
------------------------------------------------------
While use of stargazer
or huxtable
would be preferred, if there's an easier way to produce this sort of summary table with a different library, that would still be very helpful.
vs0
,vs1
, so thatmean
will show what proportion ofvs
is==0
and==1
. For factors with more values, I'd be thinking to create more dummies, eg frommtcars
:cyl4
,cyl6
,cyl8
– AsphaltiteepiDisplay::codebook(mtcars_df)
gives appropriate summaries of numeric and factors. – Iatricgtsummary
might be helpful and has bothgt
andhuxtable
output – Holcman