dplyr: apply function table() to each column of a data.frame
Asked Answered
O

4

23

Apply function table() to each column of a data.frame using dplyr

I often apply the table-function on each column of a data frame using plyr, like this:

library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) )  )

Is it possible to do this in dplyr also?

My attempts fail:

mtcars %>%  do( table %>% data.frame() )
melt( mtcars ) %>%  do( table %>% data.frame() )
Oriane answered 26/12, 2014 at 17:9 Comment(2)
You could convert this to long form using gather from library(tidyr) and then do gather(mtcars, Var, Val) %>% group_by(Var) %>% dplyr::mutate(n=n()) %>% group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>% unique()Gareth
can you post a full answer using this approachBittern
O
12

Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or:

mtcars %>%
    map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
    map( table )
Oriane answered 20/3, 2018 at 11:2 Comment(1)
Using the purrr anonymous function syntax, that would be mtcars %>% map(~table(.))Corley
R
13

You can try the following which does not rely on the tidyr package.

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))
Representational answered 26/12, 2014 at 17:59 Comment(1)
can you elaborate the answer. I am getting some errors due to worse input data.frame and would like to troubleshoot. can I use purrr:map and not Map error is Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0Bittern
O
12

Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or:

mtcars %>%
    map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
    map( table )
Oriane answered 20/3, 2018 at 11:2 Comment(1)
Using the purrr anonymous function syntax, that would be mtcars %>% map(~table(.))Corley
T
11

In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.

Thomasthomasa answered 17/3, 2015 at 19:59 Comment(2)
mtcars %>% count(cyl) or mtcars %>% count(gear, cyl). I think the question is how to do this for every variable in one call.Ankh
Fair enough; but I just wanted to point out that usually running this on every single column will result in really, really long output. At least one of the columns is likely to be a unique id variable. I updated my answer to include the use of count since it does the group_by for you. Thanks!Thomasthomasa
B
0

Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.

library(nycflights13);dim(flights)

tte<-gather(flights, Var, Val) %>% 
group_by(Var) %>% dplyr::mutate(n=n()) %>% 
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()
Bittern answered 7/6, 2019 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.