Merging Listed Vectors that share Elements in R

Asked 5/3, 2018 at 0:2 Answered 16/12, 2021 at 12:54

thanks for your help. I'm trying to write an R function that will take a list containing numeric vectors and merge all the list elements which share numbers. I'm not sure I'm explaining the problem properly, so I hope you don't mind if I use an analogy. An example list could look like this:

> list(c(1, 6), c(2, 3), c(3, 2), c(4, 5, 6), c(5, 4), c(1, 6, 4))
[[1]]
[1] 1 6

[[2]]
[1] 2 3

[[3]]
[1] 3 2

[[4]]
[1] 4 5 6

[[5]]
[1] 5 4

[[6]]
[1] 1 6 4

If you imagine 6 villages, the list would show which villages are connected by roads. So list element [[1]] shows that village 1 is connected to village 1 and village 6. List element [[6]] shows that 6 is connected to village 1, village 6, and village 4. And so on. I want my output to show which villages are connected by the same 'road network', so village 1 is clearly in the same network as 6, but it should also be grouped with 4 and 5, because it is connected to them through 6 and then 4. 2 and 3 should then be grouped seperately, as they do not share a connection to the other network.

I've managed to piece together a solution, but it is deeply inelegant, and takes far too long to run for more complicated inputs. My solution is this:

input <- list(c(1, 6), c(2, 3), c(3, 2), c(4, 5, 6), c(5, 4), c(1, 6, 4))
remaining <- 1:6                  # counter where i can store which numbers have not yet been evaluated
output <- vector("list", 6)

branch <- function(x) {           # function to recursively evaluate vector elements
  for(y in x) {                                           # repeat for each vector element
    if(y %in% remaining) {                                # check if the list element corresponding to y has been evaluated
      output[[i]] <- append(output[[i]], input[[y]])      # assign list element y to output element i
      assign("output", output, envir = globalenv())       #assign output to global environment
      remaining <- remaining[remaining != y]              # remove y from future evaluations
      assign("remaining", remaining, envir = globalenv()) # assign remaining to global environment
      branch(input[[y]])                                  # evaluate branches further from y
    }
  }
}

for(i in 1:6) {                    # repeat for each element of list
  if(i %in% remaining) {           # check if list element i has already been evaluated
    branch(input[[i]])             # evaluate list element
  }
}

output <- output[-which(sapply(output, is.null))]         # remove null elements from list
output <- lapply(output, unique)                          # remove redundant elements from vectors

output

> output
[[1]]
[1] 1 6 4 5

[[2]]
[1] 2 3

Sorry for the long question, but I feel like there must be a simpler way to do this that I'm missing. Is anyone able to help out?

Pectoralis answered 5/3, 2018 at 0:2 Comment(2)

You are building a graph - villages are nodes and roads are edges. A graph library like igraph will make your analysis much easier, you just need to move your data into igraph's required format first. – Horrocks 5/3, 2018 at 0:7

Thanks very much for that lead! It's the end of the day for me now, but I'll look into that in the morning. – Pectoralis 5/3, 2018 at 0:12

As mentioned in a comment, your problem is basically that you need to build a graph and find its components - so igraph is very useful.

It turns out your data is already more or less in the right format, so you can do:

library(igraph)

input <- list(c(1, 6), c(2, 3), c(3, 2), c(4, 5, 6), c(5, 4), c(1, 6, 4))

# mode = "all" so that connections are treated as two-way,
#   i.e. an 'undirected' graph
g = graph_from_adj_list(input, mode = "all")
comp = components(g)
groups(comp)

Output:

$`1`
[1] 1 4 5 6

$`2`
[1] 2 3

You can also do things like visualise your graph easily with plot(g):

PS: It doesn't affect this simple example, but the graph does contain a loop where 1 connects to itself - you might need to filter these self-connections out of the input data.

Horrocks answered 5/3, 2018 at 0:15 Comment(0)

After I placed similar question here, I knew your question. I thought your question and my question are essentially the same. But solution here is not applicable to my case. In reverse answers to my question don't work for this case.

So I created R code to work on both cases under base R.

input <- list(c(1, 6), c(2, 3), c(3, 2), c(4, 5, 6), c(5, 4), c(1, 6, 4))

repeat {
  # Get A Count Table
  tbl <- table(unlist(input))
  # No Duplicated Items Then break Out
  if (length(tbl[tbl > 1]) == 0) { break }
  # Take A First Duplicated Item And Get Index Where The Item Is Located
  idx <- which(sapply(seq_len(length(input)), function(i) {
    any(input[[i]] == names(tbl[tbl > 1])[1])
  }))
  # Create New vector By Union
  newvec <- sort(unique(unlist(input[idx])))
  # Append newvec To cbnl And Remove Original vectors
  input <- c(input, list(newvec))[-idx]
}

input

gives

[[1]]
[1] 2 3

[[2]]
[1] 1 4 5 6

Epicontinental answered 16/12, 2021 at 12:54 Comment(0)

Recommended topics

Hot tags