How to aggregate categorical data in R?
Asked Answered
D

3

8

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns. The dataframe I am using is as follows:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it?

Drone answered 2/4, 2019 at 16:26 Comment(3)
Looks like you need table(df1)Advancement
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?Drone
I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1)Advancement
P
8

As mentioned in the comments, table is standard for this, like

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

or

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0
Prouty answered 2/4, 2019 at 16:48 Comment(0)
P
3
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0
Peccant answered 2/4, 2019 at 16:33 Comment(0)
H
2

One dplyr and tidyr possibility could be:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.

Or with dplyr and reshape2 you can do:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0
Hostility answered 2/4, 2019 at 16:41 Comment(1)
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?Drone

© 2022 - 2024 — McMap. All rights reserved.