Count number of rows by group using dplyr
Asked Answered
O

6

137

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr)
ddply(mtcars, .(cyl,gear),nrow)

has output

  cyl gear V1
1   4    3  1
2   4    4  8
3   4    5  2
4   6    3  2
5   6    4  4
6   6    5  1
7   8    3 12
8   8    5  2

Using this code

library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))

has output

  length(cyl)
1          32

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found

Tried using n(), which returned

Error in n() : This function should not be called directly

What am I doing wrong? How can I get group_by() / summarise() to work for me?

Oxidate answered 31/3, 2014 at 17:11 Comment(13)
I can't reproduce this. I get the same output as from ddply. What version of dplyr are you on? Try updating?Wangle
I have the latest version 0.1.3. Do you have 0.1.2?Oxidate
Nope. Your example works just fine for me with 0.1.3.Wangle
What version of R do you have? Could that be causing the difference in behavior? I also tried this on a computer at home which is using Ubuntu, same thing..Oxidate
I'm on 3.0.2, but I'd be surprised if that makes a difference, unless your version of R is very, very old.Wangle
I am on 3.0.2 as well. I have scoured through countless blogs and tutorials since last week before posting. If this does not work, is there another way I can count number of rows for a combination?Oxidate
<shrug> Start from the beginning, try it in a clean R session with no other packages loaded. BTW, your sum(G) example doesn't work because presumably G doesn't exist. Maybe you meant g? And your example also works fine for me using n().Wangle
The result you claim to be getting would only make sense if g was actually not grouped. So I would look for a reason why you might be somehow ungrouping your data frame.Wangle
It works for me too... You can try mtcars %.% group_by(cyl, gear) %.% summarise(length(gear)), it gives the same result.Desdamonna
Thanks!!! That solved it. If I have both plyr and dplyr packages attached, summarise does not work as expected. As soon as I restarted the session (and did not attach all normal packages by default) I was able to make it work. Phew.Oxidate
n() also works once I detached plyr package. So the main culprit plyr package that interferes with dplyr package. Thanks for the help again!!!Oxidate
FYI there's a warning on the package's github page that addresses this.Wangle
Thanks, you are right. Once I follow the recommended order it is working okay.Oxidate
S
191

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear)          # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2
Sherman answered 18/1, 2016 at 13:47 Comment(0)
R
23

I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(count = n())

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.

Riannon answered 5/2, 2019 at 16:46 Comment(1)
Works for me. The crucial part is NOT specifying the quotes around column names to group by.Demonism
M
21

Another approach is to use the double colons as this will help avoid potential conflicts with functions with similar names from other packages.

mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(length(gear))
Mesomorph answered 3/4, 2014 at 16:21 Comment(1)
Edited to add why using colons would be useful although I think the main point of this answer is the use of lengthAmok
J
3

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% 
  group_by(cyl, gear) %>%
  do(data.frame(nrow=nrow(.)))

This is equivalent to using count():

library(dplyr, warn.conflicts = FALSE)
all.equal(mtcars %>% 
            group_by(cyl, gear) %>%
            do(data.frame(n=nrow(.))) %>% 
            ungroup(),
          count(mtcars, cyl, gear), check.attributes=FALSE)
#> [1] TRUE
Joan answered 16/1, 2016 at 2:40 Comment(3)
not sure what you mean, it gives nrow of each group, as in count?Joan
I'm not sure as this was 6 years ago. Likely some of the functions have changed. I've removed my old comment.Revivalism
thanks @Hack-R!Joan
M
1

Another option is using the function tally from dplyr. Here is a reproducible example:

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  tally()
#> # A tibble: 8 × 3
#> # Groups:   cyl [3]
#>     cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     3     1
#> 2     4     4     8
#> 3     4     5     2
#> 4     6     3     2
#> 5     6     4     4
#> 6     6     5     1
#> 7     8     3    12
#> 8     8     5     2

Created on 2022-09-11 with reprex v2.0.2

Measures answered 11/9, 2022 at 16:6 Comment(0)
Y
1

One could simply do this using dplyr:

library(dplyr)

# Use the function add_count() and name the new variable as "count"

mtcars %>%
  add_count(cyl, gear, name = "count")

This way you do not need to group the columns as the add_count() function does that for you when you mention variables. Also, this allows you to retain other variables in the data frame (if any).

Yila answered 8/11, 2023 at 6:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.