How to summarize a list of combination

Asked 15/12, 2021 at 12:44 Answered 21/12, 2021 at 12:52

I have a list of 2 elements' combination like below.

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

I'd like to summarize above list. Expected result is like below list. Order of element in a vector doesn't matter here.

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

(Rule 1) {A, B} is equivalent to {B, A}. To correspond this I think I can do this.

cbnl <- unique(lapply(cbnl, function(i) { sort(i) }))

(Rule 2) {A, B}, {B, C} (One of element is common) then take a union of two sets. It results {A, B, C}. I don't have clear nice idea to do this.

Any efficient way to do this?

Superaltar answered 15/12, 2021 at 12:44 Comment(2)

Yes, it's essentially the same. From graph theory view point this can be thought as connections of nodes. Thank you for your point. – Superaltar 15/12, 2021 at 21:57

I thought Merging Listed Vectors that share Elements in R is essentially the same, but R code doesn't work for this case. The answer here may not work for a list of numeric vectors..... – Superaltar 16/12, 2021 at 8:56

I know this answer is more like a traditional programming rather than "R like" but it solves the issue.

cbnl <- unique(lapply(cbnl, sort))

i <- 1
count <- 1
out <- list()

while (i <= length(cbnl) - 1) {
  if (sum(cbnl[[i]] %in% cbnl[[i + 1]]) == 0) {
    out[[count]] <- cbnl[[i]]
    } else {
      out[[count]] <- sort(unique(c(cbnl[[i]], cbnl[[i + 1]])))
      i <- i + 1        
    }
  count <- count + 1
  i <- i + 1 
}

out

gives,

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

Wanderlust answered 15/12, 2021 at 12:58 Comment(1)

Great. Thank you for the answer. I got a expected result. – Superaltar 15/12, 2021 at 13:0

You can try the following igraph option

library(igraph)

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  components() %>%
  membership() %>%
  stack() %>%
  with(., split(as.character(ind), values))

which gives

$`1`
[1] "A" "B"

$`2`
[1] "C" "E" "D"

$`3`
[1] "F" "G"

$`4`
[1] "H" "I"

$`5`
[1] "J" "K"

A shorter one

graph_from_data_frame(do.call(rbind, cbnl)) %>%
  decompose() %>%
  Map(function(x) names(V(x)), .)

which gives

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "E" "D"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

Dilettante answered 15/12, 2021 at 12:59 Comment(5)

Oh, this is shorter command. Expected result comes out. – Superaltar 15/12, 2021 at 13:5

@Superaltar you can find a shorter one in my update – Dilettante 15/12, 2021 at 13:35

Thank you. Very simple code!! – Superaltar 15/12, 2021 at 13:37

The funny thing is that few months ago I was searching for an algorithm / function to find records linked in this way and I remember I couldn't find an appropriate words to use in google (but I also didn't ask on SO) and now I see I should look around graph topic :D (I was sure that something like this must exists already!). – Hyehyena 15/12, 2021 at 13:59

It sounds a good info. If you can provide what a exact name of topic is, please let me know. – Superaltar 15/12, 2021 at 14:11

Base R: sorting union as FUN= in combn, then partly filling a matrix based on unique elements u and removing duplicated rows, and finally coercing as.list.

u <- Reduce(union, cbnl)  ## get unique elements

res <- combn(cbnl, 2, \(x) {
  if (length(intersect(x[[1]], x[[2]])) > 0) {
    union(x[[1]], x[[2]])
  } else {
    el(x)
  }
}, simplify=FALSE) |>
  unique() |>
  (\(x) sapply(x, \(i) replace(rep(NA, length(u)), match(i, u), i)))() |>
  (\(x) x[, !colSums(duplicated(x, MARGIN=1:2)) == nrow(x)])() |>
  (\(x) unname(lapply(as.list(as.data.frame(x)), \(x) x[!is.na(x)])))()

res
# [[1]]
# [1] "A" "B"
# 
# [[2]]
# [1] "C" "D" "E"
# 
# [[3]]
# [1] "F" "G"
# 
# [[4]]
# [1] "H" "I"
# 
# [[5]]
# [1] "J" "K"

Note:

> R.version.string
[1] "R version 4.1.2 (2021-11-01)"

Seanseana answered 15/12, 2021 at 13:7 Comment(7)

Thank you for your answer. But expected result is not shown. I need to think how to get expected list in your way. – Superaltar 15/12, 2021 at 13:13

@Superaltar Actually I missed a case handling, check update. I'm not yet sure, though, which rule exactly leads to your expected result? – Seanseana 15/12, 2021 at 13:23

Thank you very much. Almost completed. Just need to delete {C, D}, {D, E}, {C, E} because {C, D, E} includes them. – Superaltar 15/12, 2021 at 13:28

@Superaltar Got ya, see update! – Seanseana 15/12, 2021 at 13:57

Thank you. Nearly completed but still {D, E} remains... – Superaltar 15/12, 2021 at 14:13

@Superaltar Now it should work! – Seanseana 15/12, 2021 at 16:26

Finally, yes I could get expected result. Thank you very much. – Superaltar 15/12, 2021 at 20:27

I took a one line of code from @ThomasIsCoding and would like to show that we can achieve this using my package dedupewider.

library(dedupewider)
library(purrr)
library(magrittr)

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

cbnl_df <- data.frame(do.call(rbind, cbnl))

result <- dedupe_wide(cbnl_df, names(cbnl_df)) # it performs deduplication by connecting elements which are linked by transitive relation

result_list <- as.list(as.data.frame(t(result)))

result_list <- map(result_list, ~ .x[!is.na(.x)]) # remove NA
result_list
#> $V1
#> [1] "A" "B"
#> 
#> $V2
#> [1] "C" "E" "D"
#> 
#> $V3
#> [1] "F" "G"
#> 
#> $V4
#> [1] "H" "I"
#> 
#> $V5
#> [1] "J" "K"

A lot of steps are necessary, because list is an input and output, so with data.frame we would have less code than above.

Hyehyena answered 15/12, 2021 at 13:26 Comment(2)

Thank you for your answer. My original data is data.table so your data.frame way is more preferable for me. I could get my expected result. – Superaltar 15/12, 2021 at 13:34

That's even better, because dedupe_wide internally uses setDT (at the end removes data.table class if wasn't present at the beginning) if needed, so you should get data.table object as returned value when data.table is an input. – Hyehyena 15/12, 2021 at 13:51

Thank you for all supporters' wonderful answers.

Let me post my own solution by base R as below;

cbnl <- list(
  c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"),
  c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"),
  c("D", "E"), c("C", "E")
)

repeat {
  # Get A Count Table
  tbl <- table(unlist(cbnl))
  # No Duplicated Items Then break Out
  if (length(tbl[tbl > 1]) == 0) { break }
  # Take A First Duplicated Item And Get the Index
  idx <- which(sapply(seq_len(length(cbnl)), function(i) {
    any(cbnl[[i]] == names(tbl[tbl > 1])[1])
  }))
  # Create New vector By Taking Union
  newvec <- sort(unique(unlist(cbnl[idx])))
  # Append newvec To cbnl And Remove Original vectors
  cbnl <- c(cbnl, list(newvec))[-idx]
}

cbnl

The result is

[[1]]
[1] "A" "B"

[[2]]
[1] "C" "D" "E"

[[3]]
[1] "F" "G"

[[4]]
[1] "H" "I"

[[5]]
[1] "J" "K"

Here is data.table version.

cbn <- data.table(
  item1 = c("A", "B", "C", "E", "F", "H", "J", "I", "K", "G", "D", "E", "D", "C"),
  item2 = c("B", "A", "D", "D", "G", "I", "K", "H", "J", "F", "C", "C", "E", "E")
)

repeat {
  # Get A Count Table
  tbl <- table(as.vector(as.matrix(cbn)))
  # No Duplicated Items Then break Out
  if (length(tbl[tbl > 1]) == 0) { break }
  # Take A First Duplicated Item And Get Row Numbers Where The Item Is Located
  idx <- which(cbn == names(tbl[tbl > 1])[1], arr.ind = TRUE)[, 1]
  # Create New Row By Taking Union
  newrow <- setDT(as.list(sort(unique(as.vector(as.matrix(cbn[idx]))))))
  names(newrow) <- paste0("item", seq_len(ncol(newrow)))
  # Append newrow To cbn And Remove Original Rows
  cbn <- rbindlist(l = list(cbn, newrow), use.names = TRUE, fill = TRUE)[-idx]
}

cbn

This result is as below.

   item1 item2 item3
1:     A     B  <NA>
2:     C     D     E
3:     F     G  <NA>
4:     H     I  <NA>
5:     J     K  <NA>

Superaltar answered 21/12, 2021 at 12:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags