Return most frequent string value for each group [duplicate]
Asked Answered
P

3

36
a <- c(rep(1:2,3))
b <- c("A","A","B","B","B","B")
df <- data.frame(a,b)

> str(b)
chr [1:6] "A" "A" "B" "B" "B" "B"

  a b
1 1 A
2 2 A
3 1 B
4 2 B
5 1 B
6 2 B

I want to group by variable a and return the most frequent value of b

My desired result would look like

  a b
1 1 B
2 2 B

In dplyr it would be something like

df %>% group_by(a) %>% summarize (b = most.frequent(b))

I mentioned dplyr only to visualize the problem.

Pareto answered 28/4, 2015 at 14:19 Comment(3)
OK, I found it myself df %>% group_by(a) %>% summarize (b =names(which.max(table(b))))Pareto
how do you get the counts as well.Vidovik
@ChirayuChamoli: To get the count as well, simply use max(table(b)). Full code would be: df %>% group_by(a) %>% summarize (b =names(which.max(table(b))), count_b = max(table(b)))Moriarty
E
31

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this:

df %>% 
  count(a, b) %>%
  slice(which.max(n))

Source: local data frame [2 x 3]
Groups: a

  a b n
1 1 B 2
2 2 B 2

Of course there are other approaches, so this is only one possible "key".

Eudoca answered 28/4, 2015 at 14:25 Comment(2)
Hi, I tried this solution. Not really working. the function will only return the maximum n row, not maximum row per group. In this case, there is a draw --so you can see 2 rows(both n is 2, if group A==1 maximum b==B n is 3 and group A==2, maximum b==B is 2 then you will only have one row)Fibroid
I think @Eudoca may have forgot a line of code. Pipe the output of count into group_by before piping them into sliceNovocaine
P
10

Other answers ignore tied frequencies
What works for me:

# A and B are tied
a <- c(rep(1:2,5))
b <- c("A","A","A","A","B","B","B","B","C","C")
df3 <- data.frame(a,b)

library(data.table)
setDT(df3)[ , .N, by=.(a, b)][ , .SD[ N == max(N) ], by = a] # includes ties

library(dplyr)
df3 |>
  group_by(a) |>
  count(b) |>
  top_n(1) # includes ties
Puccoon answered 8/11, 2018 at 20:23 Comment(2)
Thank you but you need to be careful with duplicatesCabernet
This doesn't work for me.. I get the wrong output. Probably because of duplicatesThee
L
2

by() each value of a, create a table() of b and extract the names() of the largest entry in that table():

> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"

You can wrap this in as.table() to get a prettier output, although it still does not exactly match your desired result:

> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2 
B B
Lethia answered 28/4, 2015 at 14:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.