Count number of rows per group and add result to original data frame

P

12

67

Say I have a data.frame object:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))

Now I want to count the number of rows (observations) of for each combination of name and type. This can be done like so:

table(df[ , c("name","type")])

or possibly also with plyr, (though I am not sure how).

However, how do I get the results incorporated into the original data frame? So that the results will look like this:

df
#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1

where count now stores the results from the aggregation.

A solution with plyr could be interesting to learn as well, though I would like to see how this is done with base R.

Prot answered 16/9, 2011 at 21:33 Comment(0)

G

87

Using data.table:

library(data.table)
dt = as.data.table(df)

# or coerce to data.table by reference:
# setDT(df)

dt[ , count := .N, by = .(name, type)]

For pre-data.table 1.8.2 alternative, see edit history.

Using dplyr:

library(dplyr)
df %>%
  group_by(name, type) %>%
  mutate(count = n())

Or simply:

add_count(df, name, type)

Using plyr:

plyr::ddply(df, .(name, type), transform, count = length(num))

Gametogenesis answered 16/9, 2011 at 21:37 Comment(1)

Do you need "setkeyv(dt, c('name', 'type'))"? – Offshoot 13/12, 2016 at 19:53

M

31

You can use ave:

df$count <- ave(df$num, df[,c("name","type")], FUN=length)

Mccool answered 16/9, 2011 at 21:36 Comment(2)

Could also do it a bit cleaner perhaps using transform(df, count = ave(num, name, type, FUN = length)) or with – Nowell 4/2, 2019 at 17:59

If you have lots of data, this command is SUPERSLOW – Titi 18/10, 2021 at 18:54

K

8

You can do this:

> ddply(df,.(name,type),transform,count = NROW(piece))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1

or perhaps more intuitively,

> ddply(df,.(name,type),transform,count = length(num))
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red plate   3     1
5   red  sofa   4     1

Kenric answered 16/9, 2011 at 21:35 Comment(0)

P

5

This should do your work :

df_agg <- aggregate(num~name+type,df,FUN=NROW)
names(df_agg)[3] <- "count"
df <- merge(df,df_agg,by=c('name','type'),all.x=TRUE)

Pitchford answered 24/2, 2017 at 9:45 Comment(0)

P

3

The base R function aggregate will obtain the counts with a one-liner, but adding those counts back to the original data.frame seems to take a bit of processing.

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
df
#    name  type num
# 1 black chair   4
# 2 black chair   5
# 3 black  sofa  12
# 4   red  sofa   4
# 5   red plate   3

rows.per.group  <- aggregate(rep(1, length(paste0(df$name, df$type))),
                             by=list(df$name, df$type), sum)
rows.per.group
#   Group.1 Group.2 x
# 1   black   chair 2
# 2     red   plate 1
# 3   black    sofa 1
# 4     red    sofa 1

my.summary <- do.call(data.frame, rows.per.group)
colnames(my.summary) <- c(colnames(df)[1:2], 'rows.per.group')
my.data <- merge(df, my.summary, by = c(colnames(df)[1:2]))
my.data
#    name  type num rows.per.group
# 1 black chair   4              2
# 2 black chair   5              2
# 3 black  sofa  12              1
# 4   red plate   3              1
# 5   red  sofa   4              1

Pavis answered 14/1, 2017 at 15:56 Comment(0)

M

2

Using sqldf package:

library(sqldf)

sqldf("select a.*, b.cnt
       from df a,
           (select name, type, count(1) as cnt
            from df
            group by name, type) b
      where a.name = b.name and
            a.type = b.type")

#    name  type num cnt
# 1 black chair   4   2
# 2 black chair   5   2
# 3 black  sofa  12   1
# 4   red  sofa   4   1
# 5   red plate   3   1

Marola answered 3/12, 2018 at 20:21 Comment(0)

D

2

Another option using add_tally from dplyr. Here is a reproducible example:

df <- data.frame(name=c('black','black','black','red','red'),
                 type=c('chair','chair','sofa','sofa','plate'),
                 num=c(4,5,12,4,3))
library(dplyr)
df %>%
  group_by(name, type) %>%
  add_tally(name = "count")
#> # A tibble: 5 × 4
#> # Groups:   name, type [4]
#>   name  type    num count
#>   <chr> <chr> <dbl> <int>
#> 1 black chair     4     2
#> 2 black chair     5     2
#> 3 black sofa     12     1
#> 4 red   sofa      4     1
#> 5 red   plate     3     1

^{Created on 2022-09-11 with reprex v2.0.2}

Dervish answered 11/9, 2022 at 17:46 Comment(0)

J

1

A two line alternative is to generate a variable of 0s and then fill it in with split<-, split, and lengths like this:

# generate vector of 0s
df$count <-0L

# fill it in
split(df$count, df[c("name", "type")]) <- lengths(split(df$num, df[c("name", "type")]))

This returns the desired result

df
   name  type num count
1 black chair   4     2
2 black chair   5     2
3 black  sofa  12     1
4   red  sofa   4     1
5   red plate   3     1

Essentially, the RHS calculates the lengths of each name-type combination, returning a named vector of length 6 with 0s for "red.chair" and "black.plate." This is fed to the LHS with split <- which takes the vector and appropriately adds the values in their given spots. This is essentially what ave does, as you can see that the second to final line of ave is

split(x, g) <- lapply(split(x, g), FUN)

However, lengths is an optimized version of sapply(list, length).

Jerilynjeritah answered 5/9, 2017 at 14:1 Comment(0)

F

1

You were just one step away from incorporating the row count into the base dataset.

Using the tidy() function from the broom package, convert the frequency table into a data frame and inner join with df:

df <- data.frame(name=c('black','black','black','red','red'),
                         type=c('chair','chair','sofa','sofa','plate'),
                         num=c(4,5,12,4,3))
library(broom)
df <- merge(df, tidy(table(df[ , c("name","type")])), by=c("name","type"))
df
   name  type num Freq
1 black chair   4    2
2 black chair   5    2
3 black  sofa  12    1
4   red plate   3    1
5   red  sofa   4    1

Felixfeliza answered 23/2, 2018 at 17:12 Comment(0)

N

1

One simple line in base R:

df$count = table(interaction(df[, (c("name", "type"))]))[interaction(df[, (c("name", "type"))])]

Same in two lines, for clarity/efficiency:

fact = interaction(df[, (c("name", "type"))])
df$count = table(fact)[fact]

Nieman answered 7/9, 2020 at 14:26 Comment(0)

A

0

In collapse, with fcount. fcount is noticeably faster than any other options.

library(collapse)
df |> 
  fcount(name, type, add = TRUE, name = "count")

#    name  type num count
# 1 black chair   4     2
# 2 black chair   5     2
# 3 black  sofa  12     1
# 4   red  sofa   4     1
# 5   red plate   3     1

Angelineangelique answered 24/10, 2023 at 14:28 Comment(0)

P

-2

Another way that generalizes more:

df$count <- unsplit(lapply(split(df, df[c("name","type")]), nrow), df[c("name","type")])

Prot answered 19/2, 2013 at 23:39 Comment(1)

Please explain how does this generalize more? – Shaum 18/7, 2013 at 6:7

Recommended topics

Hot tags