Is there an R function to get the unique edges in an undirected (not directed) network?

Asked 9/4, 2019 at 13:14 Answered 3/11, 2019 at 21:46

I want to count the number of the unique edges in an undirected network, e.g, net

   x  y
1  A  B
2  B  A
3  A  B

There should be only one unique edge for this matrix, because edges A-B and B-A are same for the undirected network.

For the directed network I can get the number of unique edges by:

nrow(unique(net[,c("x","y"]))

But this doesn't work for the undirected network.

Cloud answered 9/4, 2019 at 13:14 Comment(1)

You could make them all alphabetical, then do the same analysis, try #47338232 – Twofaced 9/4, 2019 at 13:18

Given that you are working with networks, an igraph solution:

library(igraph)

as_data_frame(simplify(graph_from_data_frame(dat, directed=FALSE)))

Then use nrow

Explanantion

dat %>% 
  graph_from_data_frame(., directed=FALSE) %>% # convert to undirected graph
  simplify %>%                                 # remove loops / multiple edges
  as_data_frame                                # return remaining edges

Rehm answered 9/4, 2019 at 17:1 Comment(3)

Thanks! This way is more straightforward:) and less step-consuming:) – Cloud 10/4, 2019 at 9:32

And what is better to use this method is that it excludes the self-linked edges, which is super to me. – Cloud 11/4, 2019 at 7:28

@ZQu ; yes, the default is to remove self loops, although you can choose to keep them . See the argument so f ?igraph::simplify – Rehm 11/4, 2019 at 8:5

Try this,

df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"))
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B"

So how does this work?

We are applying a function to each row of the data frame, so we can take each row at a time. Take the second row of the df,
```
df[2,]
  x y
1 B A
```
We then split (strsplit) this, and unlist into a vector of each letter, (We use as.matrix to isolate the elements)
```
unlist(strsplit(as.matrix(df[2,]), " "))
[1] "B" "A"
```
Use the sort function to put into alphabetical order, then paste them back together,
```
paste(sort(unlist(strsplit(as.matrix(df[2,]), " "))), collapse = " ")
[1] "A B"
```

Then the apply function does this for all the rows, as we set the index to 1, then use the unique function to identify unique edges.

Extension

This can be extended to n variables, for example n=3,

df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"),  z = c("C", "D", "D"))
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B C" "A B D"

If more letters are needed, just combine two letters like the following,

df <- data.frame(x=c("A", "BC", "A"), y = c("B", "A", "BC"))
df
   x  y
1  A  B
2 BC  A
3  A BC
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B"  "A BC"

Old version

Using the tidyverse package, create a function called rev that can order our edges, then use mutate to create a new column combining the x and y columns, in such a way it works well with the rev function, then run the new column through the function and find the unique pairs.

library(tidyverse)
rev <- function(x){
  unname(sapply(x, function(x) {
    paste(sort(trimws(strsplit(x[1], ',')[[1]])), collapse=',')} ))
}
df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"))
rows <- df %>% 
  mutate(both = c(paste(x, y, sep = ", ")))

unique(rev(rows$both))

Twofaced answered 9/4, 2019 at 13:25 Comment(1)

Thank you very much! I tried your old version, and it worked well:) – Cloud 9/4, 2019 at 14:13

Here is a solution without the intervention of igraph, all inside one pipe:

df = tibble(x=c("A", "B", "A"), y = c("B", "A", "B"))

It is possible to use group_by() and then sort() combinations of values and paste() them in the new column via mutate(). unique() is utilized if you have "true" duplicates (A-B, A-B will get into one group).

df %>%
  group_by(x, y) %>%
  mutate(edge_id = paste(sort(unique(c(x,y))), collapse=" "))

When you have properly sorted edge names in a new column, it's quite straightforward to count unique values or filter duplicates out of your data frame.
If you have additional variables for edges, just add them into grouping.

No answered 2/11, 2019 at 17:41 Comment(0)

If you're not using{igraph} or just want know how to do it cleanly without any dependencies...

Here's your data...

your_edge_list <- data.frame(x = c("A", "B", "A"),
                             y = c("B", "A", "B"),
                             stringsAsFactors = FALSE)
your_edge_list
#>   x y
#> 1 A B
#> 2 B A
#> 3 A B

and here's a step-by-step breakdown...

`%>%` <- magrittr::`%>%`

your_edge_list %>% 
  apply(1L, sort) %>%              # sort dyads
  t() %>%                          # transpose resulting matrix to get the original shape back
  unique() %>%                     # get the unique rows
  as.data.frame() %>%              # back to data frame
  setNames(names(your_edge_list))  # reset column names
#>   x y
#> 1 A B

If we drop the pipes, the core of it looks like this...

unique(t(apply(your_edge_list, 1, sort)))
#>      [,1] [,2]
#> [1,] "A"  "B"

And we can wrap it up in a function that 1) handles both directed and undirected, 2) handles data frames and (the more common) matrices, and 3) can drop loops...

simplify_edgelist <- function(el, directed = TRUE, drop_loops = TRUE) {
  stopifnot(ncol(el) == 2)

  if (drop_loops) {
    el <- el[el[, 1] != el[, 2], ]
  }

  if (directed) {
    out <- unique(el)
  } else {
    out <- unique(t(apply(el, 1, sort)))
  }

  colnames(out) <- colnames(el)

  if (is.data.frame(el)) {
    as.data.frame(out, stringsAsFactors = FALSE)
  } else {
    out
  }
}

el2 <- rbind(your_edge_list, 
             data.frame(x = c("C", "C"), y = c("C", "A"), stringsAsFactors = FALSE))
el2
#>   x y
#> 1 A B
#> 2 B A
#> 3 A B
#> 4 C C
#> 5 C A

simplify_edgelist(el2, directed = FALSE)
#>   x y
#> 1 A B
#> 5 A C

Bearskin answered 3/11, 2019 at 21:46 Comment(0)

Recommended topics

Hot tags