Apache Spark GraphX connected components
Asked Answered
G

2

8

How to use subgraph function to get a graph that would include only vertexes and edges from the specific connected component? Let's say I know the connected component id, the final goal is to create a new graph based on the connected component. I'd like to keep the vertex attributes from the original graph.

Gloze answered 25/5, 2015 at 21:24 Comment(0)
W
6

You have to join the graph with the component IDs to the original graph, filter (take the subgraph) by the component ID, and then discard the component ID.

import scala.reflect._
import org.apache.spark.graphx._
import org.apache.spark.graphx.lib.ConnectedComponents

def getComponent[VD: ClassTag, ED: ClassTag](
    g: Graph[VD, ED], component: VertexId): Graph[VD, ED] = {
  val cc: Graph[VertexId, ED] = ConnectedComponents.run(g)
  // Join component ID to the original graph.
  val joined = g.outerJoinVertices(cc.vertices) {
    (vid, vd, cc) => (vd, cc)
  }
  // Filter by component ID.
  val filtered = joined.subgraph(vpred = {
    (vid, vdcc) => vdcc._2 == Some(component)
  })
  // Discard component IDs.
  filtered.mapVertices {
    (vid, vdcc) => vdcc._1
  }
}
Workable answered 26/5, 2015 at 15:11 Comment(4)
Thanks for this function! But, can we not assume that if you have the CC id thenyou have probably already constructed the CC, such that this can be pulled in as one of getComponent's arguments saving the potentially expensive task of recomputing. I am curious as to why this function works, but when I did the steps manually putting a Long or whatever in as the component it fails. Must it be cast Some or VertexId?Dg
You're right — now that I look at it it's quite strange how we would have a component ID before even running ConnectedComponents. I guess I just wanted to include ConnectedComponents somewhere. I'll rework the code to be more sensible. For your other question: VertexId is an alias for Long. A plain old number should work there. What do you mean by "fails" here? Do you get an exception?Workable
Do you know how to count the number of connected components in a graph?Logical
It should be cc.vertices.values.distinct.count.Workable
B
3

I take your question to be, given a VertexId in a source graph, create a new graph with the nodes and edges connected to this VertexId from the source graph.

Given that, here's what I would do:

val targetVertexId = ...
val graph = Graph(..., ...)
val newGraph = Graph(
  graph.vertices.filter{case (vid,attr) => vid == targetVertexId} ++
  graph.collectNeighbors(EdgeDirection.Either)
    .filter{ case (vid,arr) => vid == targetVertexId}
    .flatMap{ case (vid,arr) => arr},
  graph.edges
).subgraph(vpred = { case (vid,attr) => attr != null})

Couple of things to note:

You can change EdgeDirection.Either to EdgeDirection.In or EdgeDirection.Out as needed.

The .subgraph at the end removes all Vertices where the attribute is set to null. If the original val graph has Vertices with attributes set to null this isn't going to work. Otherwise this works, without having to know the Vertex attribute type in advance.

Blomquist answered 27/5, 2015 at 3:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.