friendship network identification in R
Asked Answered
C

1

1

I want to identify networks where all people in the same network directly or indirectly connected through friendship nominations while no students from different networks are connected.

I am using the Add Health data. Each student nominates upto 10 friends. Say, sample data may look like this:

ID  FID_1   FID_2   FID_3   FID_4   FID_5   FID_6   FID_7   FID_8   FID_9   FID_10
1   2           6   7          9    10        NA     NA     NA        NA    NA
2   5           9   12        45    13        90     87     6         NA    NA
3   1           2   4          7    8          9     10     14        16    18
100   110       120   122      125   169     178    190    200       500  520
500    100      110   122      125   169     178    190    200       500  520
700    800      789    900     NA     NA       NA     NA    NA        NA   NA
1000   789     2000     820    900    NA       NA     NA    NA        NA   NA

There are around 85,000 individuals. Could anyone please tell me how I can get network ID? So, I would like the data to look the following

ID   network_ID           ID  network_ID
1     1                   700   3  
2     1                   789   3
3     1                   800   3
4     1                   820   3
5     1                   900   3
6     1                  1000   3
7     1                  2000   3
8     1
9     1
10    1
12    1
13    1
14    1
16    1
18    1
90    1
87    1
100   2
110   2
120   2
122   2
125   2
169   2
178   2
190   2
200   2
500   2
520   2

So, everyone directly or indirectly connected to ID 1 belong to network 1. 2 is a friend of 1. So, everyone directly or indirectly connected to 2 are also in 1's network and so on. 700 is not connected to 1 or friend of 1 or friend of friend of 1 and so on. Thus 700 is in a different network, which is network 3.

Any help will be much appreciated...

Caudex answered 16/6, 2021 at 21:1 Comment(3)
To be clear, do you want the result to be a table with the headers Network_ID | Student_ID, in which each row indicates the membership of that student in that network? This would be awkward as a wide table (Network_ID | Student_1_ID | Student_2_ID | Student_3_ID | ...) because, theoretically, there could be a "friendship path" connecting all of your students...and the wide table would need a new column for every student. UPDATE: Just saw your edit. Looks like my guess was right!Diacetylmorphine
Yes, exactly that is what I needed. I have just edited my question. If I first convert the data into long format, can I get network ID then?Caudex
Your clarification is helpful! Converting into long format is a piece of cake with tidyr::pivot_longer(). As for network analysis, there should definitely be something in this helpful article, though I'm still checking if any of those packages allow for the kind of tabulated output you desire. If not, a recursive algorithm should do the trick...Diacetylmorphine
K
2

Update

library(igraph)
library(dplyr)
library(data.table)

setDT(df) %>%
    melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
    na.omit() %>%
    setcolorder(c("ID", "ID2", "FID")) %>%
    graph_from_data_frame() %>%
    components() %>%
    membership() %>%
    stack() %>%
    setNames(c("Network_ID", "ID")) %>%
    rev() %>%
    type.convert(as.is = TRUE) %>%
    arrange(Network_ID, ID)

gives

     ID Network_ID
1     1          1
2     2          1
3     3          1
4     4          1
5     5          1
6     6          1
7     7          1
8     8          1
9     9          1
10   10          1
11   12          1
12   13          1
13   14          1
14   16          1
15   18          1
16   45          1
17   87          1
18   90          1
19  100          2
20  110          2
21  120          2
22  122          2
23  125          2
24  169          2
25  178          2
26  190          2
27  200          2
28  500          2
29  520          2
30  700          3
31  789          3
32  800          3
33  820          3
34  900          3
35 1000          3
36 2000          3

Data

> dput(df)
structure(list(ID = c(1L, 2L, 3L, 100L, 500L, 700L, 1000L), FID_1 = c(2L,
5L, 1L, 110L, 100L, 800L, 789L), FID_2 = c(6L, 9L, 2L, 120L,
110L, 789L, 2000L), FID_3 = c(7L, 12L, 4L, 122L, 122L, 900L,
820L), FID_4 = c(9L, 45L, 7L, 125L, 125L, NA, 900L), FID_5 = c(10L,
13L, 8L, 169L, 169L, NA, NA), FID_6 = c(NA, 90L, 9L, 178L, 178L,
NA, NA), FID_7 = c(NA, 87L, 10L, 190L, 190L, NA, NA), FID_8 = c(NA,
6L, 14L, 200L, 200L, NA, NA), FID_9 = c(NA, NA, 16L, 500L, 500L,
NA, NA), FID_10 = c(NA, NA, 18L, 520L, 520L, NA, NA)), class = "data.frame", row.names = c(NA,
-7L))

Are you looking for something like this?

library(data.table)
library(dplyr)

setDT(df) %>%
    melt(id.var = "ID", variable.name = "FID", value.name = "ID2") %>%
    na.omit() %>%
    setcolorder(c("ID", "ID2", "FID")) %>%
    graph_from_data_frame() %>%
    plot(edge.label = E(.)$FID)

enter image description here


Data

structure(list(ID = 1:3, FID_1 = c(2L, 5L, 1L), FID_2 = c(6L,
9L, 2L), FID_3 = c(7L, 12L, 4L), FID_4 = c(9L, 45L, 7L), FID_5 = c(10L,
12L, 8L), FID_6 = c(NA, 90L, 9L), FID_7 = c(NA, 87L, 10L), FID_8 = c(NA,
6L, 14L), FID_9 = c(NA, NA, 16L), FID_10 = c(NA, NA, 18L)), class = "data.frame", row.names = c(NA,
-3L))
Kreiner answered 16/6, 2021 at 21:12 Comment(5)
Thank you so much for your quick reply. I have edited my question. I am looking for a column with network ID. Thanks again...Caudex
Nice! You should probably include library(igraph) and any other dependencies, though, because some of these functions (like igraph::graph_from_data_frame()) are rather specialized and so obscure to newcomers.Diacetylmorphine
@Diacetylmorphine Sorry that I forgot to add it to my code. Now fixed. Thanks a lot!Kreiner
Thank you sooooo much! It was very quick! I think your code is exactly what's needed. Out of 856,270 individuals from 142 schools in my data, I got now 424 networks. The number of networks seems reasonable. Again, thanks a lot, you saved me so much time!!!!Caudex
@Caudex you are welcome. If you think this answer is helpful. please feel free to upvote or accept it. Thanks!Kreiner

© 2022 - 2024 — McMap. All rights reserved.