Create combinations by group and sum

Asked 1/7, 2022 at 1:22 Answered 1/7, 2022 at 13:16

I have data of names within an ID number along with a number of associated values. It looks something like this:

structure(list(id = c("a", "a", "b", "b"), name = c("bob", "jane", 
"mark", "brittney"), number = c(1L, 2L, 1L, 2L), value = c(1L, 
2L, 1L, 2L)), class = "data.frame", row.names = c(NA, -4L))

#   id     name number value
# 1  a      bob      1     1
# 2  a     jane      2     2
# 3  b     mark      1     1
# 4  b brittney      2     2

I would like to create all the combinations of name, regardless of how many there are, and paste them together separated with commas, and sum their number and value within each id. The desired output from the example above is then:

structure(list(id = c("a", "a", "a", "b", "b", "b"), name = c("bob", 
"jane", "bob, jane", "mark", "brittney", "mark, brittney"), number = c(1L, 
2L, 3L, 1L, 2L, 3L), value = c(1L, 2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))

#   id           name number value
# 1  a            bob      1     1
# 2  a           jane      2     2
# 3  a      bob, jane      3     3
# 4  b           mark      1     1
# 5  b       brittney      2     2
# 6  b mark, brittney      3     3

Thanks all!

Monanthous answered 1/7, 2022 at 1:22 Comment(0)

You can create pairwise indices using combn() and expand the data frame with these using slice(). Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.

library(dplyr)
library(purrr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3

Edit:

To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:

map_df(seq(max(table(df2$id))), ~
         df2 %>%
         group_by(id) %>%
         slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
         mutate(id2 = (row_number() - 1) %/% .x) %>%
         group_by(id, id2) %>%
         summarise(name = toString(name),
                   across(where(is.numeric), sum), .groups = "drop")
       ) %>%
  select(-id2) %>%
  arrange(id)

# A tibble: 18 × 4
   id    name                      number value
   <chr> <chr>                      <int> <int>
 1 a     bob                            1     1
 2 a     jane                           2     2
 3 a     sophie                         1     1
 4 a     jeremy                         2     2
 5 a     bob, jane                      3     3
 6 a     bob, sophie                    2     2
 7 a     bob, jeremy                    3     3
 8 a     jane, sophie                   3     3
 9 a     jane, jeremy                   4     4
10 a     sophie, jeremy                 3     3
11 a     bob, jane, sophie              4     4
12 a     bob, jane, jeremy              5     5
13 a     bob, sophie, jeremy            4     4
14 a     jane, sophie, jeremy           5     5
15 a     bob, jane, sophie, jeremy      6     6
16 b     mark                           3     5
17 b     brittney                       4     6
18 b     mark, brittney                 7    11

Data for df2:

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
                                                                     "jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L, 
                                                                                                                                 2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                              -6L))

Subdue answered 1/7, 2022 at 2:1 Comment(4)

Thank you! This is great, clever. I do need all combinations of names, not just pairwise, within each id (apologies for the too-minimal example). So changing that is lines 3 and 4? Did you have a specific tweak in mind? – Monanthous 1/7, 2022 at 11:31

Can you expand on your expected output? If for example a group contained 4 names, would you want the output to contain 4 (single) + 6 (pair combination) + 4 (triplet combinations) + 1 (quad) rows (i.e. 15 rows for that group)? – Subdue 1/7, 2022 at 11:58

@Monanthous - see edit. – Subdue 1/7, 2022 at 12:37

Thank you! I really appreciate the help. This is super clever and something I will learn a lot from. – Monanthous 1/7, 2022 at 13:3

You could use group_modify() + add_row():

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify( ~ .x %>%
    summarise(name = toString(name), across(c(number, value), sum)) %>%
    add_row(.x, .)
  ) %>%
  ungroup()

# # A tibble: 6 × 4
#   id    name           number value
#   <chr> <chr>           <int> <int>
# 1 a     bob                 1     1
# 2 a     jane                2     2
# 3 a     bob, jane           3     3
# 4 b     mark                1     1
# 5 b     brittney            2     2
# 6 b     mark, brittney      3     3

Adames answered 1/7, 2022 at 1:55 Comment(3)

I don't think this solves OPs problem - they're asking for combinations of cases. OPs example data and output was perhaps a bit too minimal. – Subdue 1/7, 2022 at 2:9

@RitchieSacramento Yes, maybe. A simplified description of questions leads to a simplified answer. Thanks for the useful comment. – Adames 1/7, 2022 at 2:18

Thanks Darren! Apologies for the too minimal example. This is great though. Yes; I need all combinations of names within each id, so I don't think this will work. Upvoted and appreciated though! – Monanthous 1/7, 2022 at 11:34

library(dplyr)
library(purrr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3

Edit:

To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:

map_df(seq(max(table(df2$id))), ~
         df2 %>%
         group_by(id) %>%
         slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
         mutate(id2 = (row_number() - 1) %/% .x) %>%
         group_by(id, id2) %>%
         summarise(name = toString(name),
                   across(where(is.numeric), sum), .groups = "drop")
       ) %>%
  select(-id2) %>%
  arrange(id)

# A tibble: 18 × 4
   id    name                      number value
   <chr> <chr>                      <int> <int>
 1 a     bob                            1     1
 2 a     jane                           2     2
 3 a     sophie                         1     1
 4 a     jeremy                         2     2
 5 a     bob, jane                      3     3
 6 a     bob, sophie                    2     2
 7 a     bob, jeremy                    3     3
 8 a     jane, sophie                   3     3
 9 a     jane, jeremy                   4     4
10 a     sophie, jeremy                 3     3
11 a     bob, jane, sophie              4     4
12 a     bob, jane, jeremy              5     5
13 a     bob, sophie, jeremy            4     4
14 a     jane, sophie, jeremy           5     5
15 a     bob, jane, sophie, jeremy      6     6
16 b     mark                           3     5
17 b     brittney                       4     6
18 b     mark, brittney                 7    11

Data for df2:

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
                                                                     "jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L, 
                                                                                                                                 2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                              -6L))

Subdue answered 1/7, 2022 at 2:1 Comment(4)

@Monanthous - see edit. – Subdue 1/7, 2022 at 12:37

Thank you! I really appreciate the help. This is super clever and something I will learn a lot from. – Monanthous 1/7, 2022 at 13:3

A data.table option

setDT(df)[
  ,
  lapply(
    .SD,
    function(x) {
      unlist(
        lapply(
          seq_along(x),
          combn,
          x = x,
          function(v) {
            ifelse(all(is.character(v)), toString, sum)(v)
          }
        )
      )
    }
  ),
  id
]

gives

   id           name number value
1:  a            bob      1     1
2:  a           jane      2     2
3:  a      bob, jane      3     3
4:  b           mark      1     1
5:  b       brittney      2     2
6:  b mark, brittney      3     3

Backfill answered 1/7, 2022 at 13:16 Comment(1)

This is great! Thank you! Upvoted. Works and generalizes to cases with many values for names. I already accepted the other answer but will learn from this and it is great to have for others' future reference. – Monanthous 1/7, 2022 at 13:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags