finding longest path in a graph

Asked 28/3, 2015 at 17:53 Answered 28/3, 2015 at 19:12

Solved python dictionary graph longest-path

I am trying to solve a program, where in I have to find the max number of cities connected for a given list of routes.

for eg: if the given route is [['1', '2'], ['2', '4'], ['1', '11'], ['4', '11']] then max cities connected will be 4 constraint is I can't visit a city which I already have visited.

I need ideas, as in how to progress.

For now, What I have thought is if I could be able to create a dictionary with cities as a key and how many other cities its connected to as its value, i get somewhere near to the solution(I hope). for eg: My dictionary will be {'1': ['2', '11'], '4': ['11'], '2': ['4']} for the above given input. I want help to proceed further and guidance if I am missing anything.

Postgraduate answered 28/3, 2015 at 17:53 Comment(3)

In terms of algorithms, you may look into depth-first search or breadth-first search – Comitia 28/3, 2015 at 17:59

@Comitia Thank you. I went through the pages. I would appreciate much more if you help me with its implementation or just elaborate how to implement it. – Postgraduate 28/3, 2015 at 18:31

The longest path problem is a NP hard problem. See en.m.wikipedia.org/wiki/Longest_path_problem. You may want to use a graph library or use a known algorithm to solve it. – Pliske 28/3, 2015 at 19:19

You can use a defaultdict to create your "Graph" from your list of edges/paths:

edges = [['1', '2'], ['2', '4'], ['1', '11'], ['4', '11']]

G = defaultdict(list)
for (s,t) in edges:
    G[s].append(t)
    G[t].append(s)

print G.items()

Output:

[
  ('1', ['2', '11']), 
  ('11', ['1', '4']), 
  ('2', ['1', '4']), 
  ('4', ['2', '11'])
]

Note that I added the edges in both directions, since you're working with an undirected graph. So with the edge (a,b), G[a] will include b and G[b] will include a.

From this, you can use an algorithm like depth-first search or breadth-first search to discover all the paths in the graph.

In the following code, I used DFS:

def DFS(G,v,seen=None,path=None):
    if seen is None: seen = []
    if path is None: path = [v]

    seen.append(v)

    paths = []
    for t in G[v]:
        if t not in seen:
            t_path = path + [t]
            paths.append(tuple(t_path))
            paths.extend(DFS(G, t, seen[:], t_path))
    return paths

Which you can use with:

G = defaultdict(list)
for (s,t) in edges:
    G[s].append(t)
    G[t].append(s)

print DFS(G, '1')

Output:

[('1', '2'), ('1', '2', '4'), ('1', '2', '4', '11'), ('1', '11'), ('1', '11', '4'), ('1', '11', '4', '2')]

So the full code, with the final bit that shows the longest path:

from collections import defaultdict

def DFS(G,v,seen=None,path=None):
    if seen is None: seen = []
    if path is None: path = [v]

    seen.append(v)

    paths = []
    for t in G[v]:
        if t not in seen:
            t_path = path + [t]
            paths.append(tuple(t_path))
            paths.extend(DFS(G, t, seen[:], t_path))
    return paths


# Define graph by edges
edges = [['1', '2'], ['2', '4'], ['1', '11'], ['4', '11']]

# Build graph dictionary
G = defaultdict(list)
for (s,t) in edges:
    G[s].append(t)
    G[t].append(s)

# Run DFS, compute metrics
all_paths = DFS(G, '1')
max_len   = max(len(p) for p in all_paths)
max_paths = [p for p in all_paths if len(p) == max_len]

# Output
print("All Paths:")
print(all_paths)
print("Longest Paths:")
for p in max_paths: print("  ", p)
print("Longest Path Length:")
print(max_len)

Output:

All Paths:
   [('1', '2'), ('1', '2', '4'), ('1', '2', '4', '11'), ('1', '11'), ('1', '11', '4'), ('1', '11', '4', '2')]
Longest Paths:
   ('1', '2', '4', '11')
   ('1', '11', '4', '2')
Longest Path Length:
   4

Note, the "starting point" of your search is specified by the second argument to the DFS function, in this case, it's '1'.

Update: As discussed in the comments the above code assumes you have a starting point in mind (specifically the code uses the node labelled '1').

A more general method, in the case that you have no such starting point, would be to perform the search starting at every node, and take the overall longest. (Note: In reality, you could be smarter than this)

Changing the line

all_paths = DFS(G, '1')

all_paths = [p for ps in [DFS(G, n) for n in set(G)] for p in ps]

would give you the longest path between any two points.

(This is a silly list comprehension, but it allows me to update only a single line. Put more clearly, it's equivalent to the following:

all_paths = []
for node in set(G.keys()):
    for path in DFS(G, node):
        all_paths.append(path)

from itertools import chain
all_paths = list(chain.from_iterable(DFS(G, n) for n in set(G)))

Comitia answered 28/3, 2015 at 19:3 Comment(11)

Great Explaination, Thankyou – Postgraduate 28/3, 2015 at 19:14

What about if this was a Directed Acyclic graph? Would I modify the for (s,t) in edges: loop? – Faerie 18/10, 2015 at 6:58

@ForeverLearning, Yes, you'd likely want to remove G[t].append(s) from that loop. – Comitia 18/10, 2015 at 7:33

also, how does this work without topologically sorting the graph for a DAG? my test cases pass but I don't understand how your implementation works without it. – Faerie 18/10, 2015 at 7:39

This approach is not the most efficient for the special case of DAGs (n.b. I'm not even sure it's the "most efficient" for any case quite frankly) where there are "tricks" you can do (i.e. topological sorting) to achieve a linear time algorithm for finding the longest path. The complexity of code in the answer is far from linear. For a general graph like the OP stated, such optimizations aren't believed possible. But for a DAG, they are, that's just not the approach I used here (because it wouldn't work for a general graph). – Comitia 18/10, 2015 at 8:22

Nice example, but I'm afraid there is a flaw in the DFS() function: The seen list must not be shared among different recursive calls of DFS() because nodes that have been visited on one path should not influence the construction of another path (namely by terminating too early). A quick fix would be to change the line paths.extend(DFS(G, t, seen, t_path)) to paths.extend(DFS(G, t, seen[:], t_path)), which will create a new copy each time. – Neoterism 20/12, 2016 at 19:4

@Neoterism Thanks for the correction -- you're absolutely right. – Comitia 6/1, 2017 at 2:52

I came here looking for a solution to a similar problem, but I'm not sure this code does what I think it should. If I add a new vertex ['5','1'] to the edges, I do not get a longer path, which it seems like I should. Have I misunderstood the question and/or the answer? – Befog 5/8, 2017 at 6:30

@Jedwards, if you can comment on that, you'd be of tremendous help for me as well. – Syncytium 3/9, 2017 at 2:12

@KellenMyers -- the answer assumes that you're only interested in paths starting at or ending at the node labelled "1" (See the final note at the bottom of the answer). This is probably a poor assumption and I'll edit the answer. But the short of it is that if the line all_paths = DFS(G,'1') were changed to DFS(G,'5'), (or 11) then a path of length 5 would be shown. For the more general case, changing that line to something like all_paths = [p for ps in [DFS(G, n) for n in set(G)] for p in ps] should work. – Comitia 8/9, 2017 at 16:32

@user1712447 stackoverflow only let me @ a single user per comment, I hope my previous comment helps. – Comitia 8/9, 2017 at 16:33

Here is my code which works for the input in the example but if I tweak the input a little bit, the code fails to give correct number of cities connected.

def dfs(graph, start, visited=None):
if visited is None:
    visited = set()
visited.add(start)
#had to do this for the key error that i was getting if the start doesn't
#have any val.
if isinstance(start,str) and start not in graph.keys():
    pass
else:
    for next in set(graph[start]) - visited:
        dfs(graph, next, visited)
return visited

def maxno_city(input1):
totalcities = []
max_nocity = 0
routedic = {}
#dup = []
rou = []
for cities in input1:
    cities = cities.split('#')
    totalcities.append(cities)
print (totalcities)
for i in totalcities:
    if i[0] in routedic.keys():
        routedic[i[0]].append(i[1])
    else:
        routedic.update({i[0]:[i[1]]})
print(routedic)
keys = routedic.keys()
newkeys = []
for i in keys:
    newkeys.append(i)
print (newkeys)
newkeys.sort()
print (newkeys)
expath = dfs(routedic,newkeys[0])
return(len(expath))

The output for the above given input is 4 and I get 4 but If the input is changed to something like this: ['1#2','2#3','1#11','3#11','4#11','4#5','5#6','5#7','6#7','4#12','8#12','9#12','8#10','9#10',8#9] My code fails.

Thanks, LearningNinja :D

Postgraduate answered 28/3, 2015 at 19:12 Comment(0)

Recommended topics

Hot tags