From dataframe to vertex/edge array
Asked Answered
D

2

7

I have the dataframe

test <- structure(list(
     y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
     y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
     y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
     y2005 = c("senior","senior","senior",NA, NA, NA)), 
              .Names = c("2002","2003","2004","2005"),
              row.names = c(c(1:6)),
              class = "data.frame")
> test
       2002      2003      2004   2005
1  freshman  freshman    junior senior
2  freshman    junior sophomore senior
3  freshman    junior sophomore senior
4 sophomore sophomore    senior   <NA>
5 sophomore sophomore    senior   <NA>
6    senior    senior      <NA>   <NA>

and I need to create a vertex/edge list (for use with igraph) with every time the student category changes in consecutive years, while ignoring when there is no change, as in

testvertices <- structure(list(
 vertex = 
  c("freshman","junior", "freshman","junior","sophomore","freshman",
    "junior","sophomore","sophomore","sophomore"),
 edge = 
  c("junior","senior","junior","sophomore","senior","junior",
    "sophomore","senior","senior","senior"),
 id =
  c("1","1","2","2","2","3","3","3","4","5")),
                       .Names = c("vertex","edge", "id"),
                       row.names = c(1:10),
                       class = "data.frame")
> testvertices
      vertex      edge id
1   freshman    junior  1
2     junior    senior  1
3   freshman    junior  2
4     junior sophomore  2
5  sophomore    senior  2
6   freshman    junior  3
7     junior sophomore  3
8  sophomore    senior  3
9  sophomore    senior  4
10 sophomore    senior  5

At this point I'm ignoring the ids, my graph should weight edges by count (i.e., freshman -> junior =3). The idea is to make a tree graph. I know it is beside the main munging point, but that's in case you ask...

Deprecative answered 11/9, 2012 at 4:40 Comment(6)
To be honest, I don't quite get what your objective is here. What exactly is the rule for creating testvertices? What are the vertices and edges in your graph?Extradition
I just edited testvertices, so freshman -> junior, junior -> sophomore, sophomore -> senior. Students can skip years (freshman -> senior) but not go back (senior -> sophomore). I noticed that user1317221_G's response does not lead to the direction implied in the years (directional). Does that answer your question, @GaborCsardi?Deprecative
Well, not really, sorry. What is a vertex/edge list? What are the vertices and edges in your graph? Vertices are freshman, junior, sophomore and senior? Just these four? Or the students? What are the edges?Extradition
I am trying to map changes from category to category, so both edges and vertices can be freshman, junior, sophomore and senior. I would like a tree, showing the path from whatever to senior. If someone starts off as sophomore and changes to senior, I expect vertex=sophomore and edge=senior, and if someone goes through all stages, I need all stages to be depicted in the tree. edit So, as a start, I am expecting that all changes happening in consecutive years will become vertex=change_from and edge=change_to, with consecutive years without changes being ignored.Deprecative
I am confused a bit. In a graph, two vertices are connected by an edge. So if you have someone changing from sophomore to senior, then sophomore and senior are connected by a directed edge? This is what you mean?Extradition
@GaborCsardi, yes, that is correct. Although in igraph's implementation, the 'vertex' column will have the starting vertex, and the 'edge' column will have the ending vertex, so that one would not need an extra column specifying the direction of the edge. Or at least that's how I am getting it done.Deprecative
E
3

If I understand you correctly, you need something like this:

elist <- lapply(seq_len(nrow(test)), function(i) {
  x <- as.character(test[i,])
  x <- unique(na.omit(x))
  x <- rep(x, each=2)
  x <- x[-1]
  x <- x[-length(x)]
  r <- matrix(x, ncol=2, byrow=TRUE)
  if (nrow(r) > 0) { r <- cbind(r, i) } else { r <- cbind(r, numeric()) }
  r
})

do.call(rbind, elist)

#                              i  
# [1,] "freshman"  "junior"    "1"
# [2,] "junior"    "senior"    "1"
# [3,] "freshman"  "junior"    "2"
# [4,] "junior"    "sophomore" "2"
# [5,] "sophomore" "senior"    "2"
# [6,] "freshman"  "junior"    "3"
# [7,] "junior"    "sophomore" "3"
# [8,] "sophomore" "senior"    "3"
# [9,] "sophomore" "senior"    "4"
#[10,] "sophomore" "senior"    "5"

It is not the most efficient solution, but I think it is fairly didactic. We create edges separately for each row of your input matrix, hence the lapply. To create the edges from a row, we first remove NAs and duplicates, and then include each vertex twice. Finally, we remove the first and last vertex. This way we created an edge list matrix, we only need to drop the first and last vertex and format it in two columns (actually it would be more efficient to leave it as a vector, never mind).

When adding the extra column, we must be careful to check whether our edge list matrix has zero rows.

The do.call function will just glue everything together. The result is a matrix, which you can convert to a data frame if you like, via as.data.frame(), and then you can also convert the third column to numeric. You can also change the column names if you like.

Extradition answered 12/9, 2012 at 6:23 Comment(4)
Thank you. It seems like I'm more lost than I assumed previously. Maybe you could help me with this other post? It spells out my problem more directly.Deprecative
Why are you lost? My code is giving you exactly the result you requested. Doesn't it?Extradition
Oh, I see. So this answers your question, and creates the data frame you thought you needed, but it turned out that this is not the data frame you needed.Extradition
Yep. It gives me a simpler answer than the one I'm seeking, but simpler is better than nothing. I thank you for helping me through so far. :-)Deprecative
O
1

Does this dow what you want ok...

test1<-c(test[[2]],test[[3]],test[[4]])
test2<-c(test[[3]],test[[4]],test[[5]])
df<-data.frame(vertex=test1,edge=test2)
df1<-df[complete.cases(df),]
result<-df1[df1$vertex != df1$edge,]
Ousel answered 11/9, 2012 at 8:58 Comment(1)
your code ignores the direction of the changes, i.e., it will both lead to vertex=freshman and edge=junior and vice-versa. I expect that the direction of the changes will be preserved.Deprecative

© 2022 - 2024 — McMap. All rights reserved.