Get all the nodes connected to a node in Apache Spark GraphX
Asked Answered
T

1

9

Suppose we have got the input in Apache GraphX as :

Vertex RDD:

val vertexArray = Array(
  (1L, "Alice"),
  (2L, "Bob"),
  (3L, "Charlie"),
  (4L, "David"),
  (5L, "Ed"),
  (6L, "Fran")
)

Edge RDD:

val edgeArray = Array(
  Edge(1L, 2L, 1),
  Edge(2L, 3L, 1),
  Edge(3L, 4L, 1),
  Edge(5L, 6L, 1)
)

I need all the components connected to a node in Apache Spark GraphX

1,[1,2,3,4]
5,[5,6]
Tempo answered 16/9, 2015 at 2:36 Comment(6)
OK, so we understand what you need. What have you tried? Or are you expecting SO to write your code for you?Fireproof
I do not expect the code but just basic outline for it.And for the question if it is required to write the stuff I have tried I think it will make the question a bit messy and not upto the point. Have seen the reference material for the Spark GraphX but was not able to get the solution for it.Tempo
Also there's collectNeighbours which seemingly does what you need: spark.apache.org/docs/latest/…Opprobrious
collectNeighbors will give the information of node -> adjacent to that node list and collectNeighborsId will give just the ID of node so that wont help me for getting all the connected componentsTempo
Output of collectNeighbors : 4 -> (3,Charlie),1 -> (2,Bob),6 -> (5,Ed),3 -> (2,Bob),(4,David),5 -> (6,Fran),2 -> (1,Alice),(3,Charlie)Tempo
You can achieve the goal manipulating by vertex and edge RDDS, the plan is the following: join vertex and edge RDD on vertex id, them map to other vertex id and finally group by key. As you are treating your graph as non-directional you may need additional manipulations, for example before joining you need to union your original edges with reversed edges (by the way you can do this and use collectNeighbours).Opprobrious
Y
11

You can use ConnectedComponents which returns

a graph with the vertex value containing the lowest vertex id in the connected component containing that vertex.

and reshape results

graph.connectedComponents.vertices.map(_.swap).groupByKey
Yawp answered 16/9, 2015 at 6:58 Comment(6)
If instead the graph was 6->5, 4->3->3->1, this would produce the wrong result, I think. It would still produce the same result and instead it should be (6, [5,6], 4, [1,2,3,4])?Fireproof
These are not strongly connected components and choice of the label is arbitrary. Using lowest id makes sense so I don't think there is a problem here.Yawp
If the label is arbitary, yes, agreed. If the OP wanted the start of the subgraph, then there's an issue. But only the OP knows this.Fireproof
Connected Components will find subgraphs, where each vertex has path to each other vertex. In fact it may be the whole initial graph if it is strongly connected.Opprobrious
you don't answer on his questionWringer
how to implement this in Java?Simard

© 2022 - 2024 — McMap. All rights reserved.