How to aggregate categorical data in R?

Asked 2/4, 2019 at 16:26 Answered 2/4, 2019 at 16:48

I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns. The dataframe I am using is as follows:

       Category.x  Category.y
1      Better      Better
2      Better      Better
3      Similar     Similar
4      Worse       Similar

I would like to come up with a table like this:

           Category.x    Category.y
Better     2             2
Similar    1             2
Worse      1             0

How would you go about it?

Drone answered 2/4, 2019 at 16:26 Comment(3)

Looks like you need table(df1) – Advancement 2/4, 2019 at 16:27

Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3? – Drone 2/4, 2019 at 16:29

I would convert to factor with common levels lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls) and then do the table(df1) – Advancement 2/4, 2019 at 16:43

As mentioned in the comments, table is standard for this, like

table(stack(DT))

         ind
values    Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

table(value = unlist(DT), cat = names(DT)[col(DT)])

         cat
value     Category.x Category.y
  Better           2          2
  Similar          1          2
  Worse            1          0

with(reshape(DT, direction = "long", varying = 1:2), 
  table(value = Category, cat = time)
)

         cat
value     x y
  Better  2 2
  Similar 1 2
  Worse   1 0

Prouty answered 2/4, 2019 at 16:48 Comment(0)

sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
#        Category.x Category.y
#Better           2          2
#Similar          1          2
#Worse            1          0

Peccant answered 2/4, 2019 at 16:33 Comment(0)

One dplyr and tidyr possibility could be:

df %>%
 gather(var, val) %>%
 count(var, val) %>%
 spread(var, n, fill = 0)

  val     Category.x Category.y
  <chr>        <dbl>      <dbl>
1 Better           2          2
2 Similar          1          2
3 Worse            1          0

It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.

Or with dplyr and reshape2 you can do:

df %>%
 mutate(rowid = row_number()) %>%
 melt(., id.vars = "rowid") %>%
 count(variable, value) %>%
 dcast(value ~ variable, value.var = "n", fill = 0)

    value Category.x Category.y
1  Better          2          2
2 Similar          1          2
3   Worse          1          0

Hostility answered 2/4, 2019 at 16:41 Comment(1)

Is var = Category.x and val= c('Better', 'Similar', 'Worse')? – Drone 2/4, 2019 at 16:56

Recommended topics

Hot tags