I'm having trouble figuring out how to update the networkx dag_find_longest_path() algorithm to return "N" for ties instead of returning the first max edge found, or returning a list of all edges that are tied for max weight.
I first created a DAG from pandas dataframe that contains an edgelist like the following subset:
edge1 edge2 weight
115252161:T 115252162:A 1.0
115252162:A 115252163:G 1.0
115252163:G 115252164:C 3.0
115252164:C 115252165:A 5.5
115252165:A 115252166:C 5.5
115252162:T 115252163:G 1.0
115252166:C 115252167:A 7.5
115252167:A 115252168:A 7.5
115252168:A 115252169:A 6.5
115252165:A 115252166:G 0.5
Then I use the following code to topologically sort the graph and then find the longest path according to the weights of the edges:
G = nx.from_pandas_edgelist(edge_df, source="edge1",
target="edge2",
edge_attr=['weight'],
create_using=nx.OrderedDiGraph())
longest_path = pd.DataFrame(nx.dag_longest_path(G))
This works great, except when there are ties for max weighted edge, it returns the first max edge found, and instead I would like it to just return an "N" representing "Null". So currently, the output is:
115252161 T
115252162 A
115252163 G
115252164 C
115252165 A
115252166 C
But what I actually need is:
115252161 T
115252162 N (or [A,T] )
115252163 G
115252164 C
115252165 A
115252166 C
The algorithm for finding the longest path is:
def dag_longest_path(G):
dist = {} # stores [node, distance] pair
for node in nx.topological_sort(G):
# pairs of dist,node for all incoming edges
pairs = [(dist[v][0] + 1, v) for v in G.pred[node]]
if pairs:
dist[node] = max(pairs)
else:
dist[node] = (0, node)
node, (length, _) = max(dist.items(), key=lambda x: x[1])
path = []
while length > 0:
path.append(node)
length, node = dist[node]
return list(reversed(path))
Copy-pastable definition of G
.
import pandas as pd
import networkx as nx
import numpy as np
edge_df = pd.read_csv(
pd.compat.StringIO(
"""edge1 edge2 weight
115252161:T 115252162:A 1.0
115252162:A 115252163:G 1.0
115252163:G 115252164:C 3.0
115252164:C 115252165:A 5.5
115252165:A 115252166:C 5.5
115252162:T 115252163:G 1.0
115252166:C 115252167:A 7.5
115252167:A 115252168:A 7.5
115252168:A 115252169:A 6.5
115252165:A 115252166:G 0.5"""
),
sep=r" +",
)
G = nx.from_pandas_edgelist(
edge_df,
source="edge1",
target="edge2",
edge_attr=["weight"],
create_using=nx.OrderedDiGraph(),
)
longest_path = pd.DataFrame(nx.dag_longest_path(G))
115252161:T
and115252162:T
. – Edmundedmunda