Graphframes: BFS between two lists of vertices in spark graphframes
Asked Answered
L

1

6

My aim is to find whether the max path length between two vertices is <= 4.

I have a graph dataframe and a test file of the below format.

I am trying to get the output column(OP) from bfs function of graph dataframes.

Col1, Col2, OP
a1,   a4,   true
a2,   a1,   false
a3,   a5,   true

Currently, I am looping through each and every row and applying bfs like below

gf.bfs.fromExpr("id = 'a1'").toExpr("id = 'a4'").maxPathLength(4).run()

Are there any better approaches where I can directly plugin list of vertices at source and destination to calculate the bfs in graph frames.

Livelong answered 2/12, 2019 at 9:45 Comment(0)
S
0

You can use motif finding for this, and join the result to your dataframe with Col1 and Col2. Something like this:

motifs= [
      "(start)-[e1]->(end)",
      "(start)-[e1]->(n1); (n1)-[e2]->(end)",
      "(start)-[e1]->(n1); (n1)-[e2]->(n2); (n2)-[e3]->(end)",
      "(start)-[e1]->(n1); (n1)-[e2]->(n2); (n2)-[e3]->(n3); (n3)-[e4]->(end)",

for motif in motifs:
    reachable = df.join(g.find(motif),[f.col("start")==f.col("Col1"),f.col("end")==f.col("Col2")],"inner")
    unreachable = df.join(g.find(motif),[f.col("start")==f.col("Col1"),f.col("end")==f.col("Col2")],"leftanti")

In each iteration of the loop, it will find the Col1-Col2 pairs that are reachable within 1/2/3/4 edges. Of course, you still need to union all reachable dataframes over the loop, and possibly do some deduplication for nodes that can be connected in multiple ways.

Saucier answered 10/4 at 12:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.