What data structures to use for Dijkstra's algorithm in Erlang?
Asked Answered
S

1

6

Disclaimer: The author is a newbie in Erlang.

Imagine, we have a graph consisting of 1M nodes, and each node has 0-4 neighbours (the edges are emanating from each node to those neighbours, so the graph is directed and connected).

Here is my choice of data structures:

To store the graph I use digraph, which is based on ETS tables. This allows fast (O(1)) access to the neighbours of a node.

For the list of unvisited nodes, I use gb_sets:take_smallest (the node is already sorted, and it is simultaneously deleted after fetching).

For the list of predecessors I use the dict structure, which allows to store the predecessors in the following way: {Node1,Node1_predecessor},{Node2,Node2_predecessor}.

For the list of visited nodes I use a simple list.

Problems:

  1. The code becomes very hard to read and maintain when I try to update the weight of a node both in the digraph structure and in the Unvisited_nodes structure. It doesn't seem the right way to keep one 'object' with the 'fields' that need to be updated in two data structures simultaneously. What is the right way to do that?
  2. The same question is about predecessors list. Where should I store the predecessor 'field' of a node 'object'? Maybe in the Graph (digraph structure)?
  3. Maybe I should rethink the whole Dijkstra's algorithm in terms of processes and messages instead of objects (nodes and edges) and their fields(weights)?

UPD:

Here is the code based on the recommendations of Antonakos:

dijkstra(Graph,Start_node_name) ->

    io:format("dijkstra/2: start~n"),

    Paths = dict:new(),
    io:format("dijkstra/2: initialized empty Paths~n"),

    Unvisited = gb_sets:new(),
    io:format("dijkstra/2: initialized empty Unvisited nodes priority queue~n"),

    Unvisited_nodes = gb_sets:insert({0,Start_node_name,root},Unvisited),
    io:format("dijkstra/2: Added start node ~w with the weight 0 to the Unvisited nodes: ~w~n", [Start_node_name, Unvisited_nodes]),

    Paths_updated = loop_through_nodes(Graph,Paths,Unvisited_nodes),
    io:format("dijkstra/2: Finished searching for shortest paths: ~w~n", [Paths_updated]).




loop_through_nodes(Graph,Paths,Unvisited_nodes) ->
    %% We need this condition to stop looping through the Unvisited nodes if it is empty
    case gb_sets:is_empty(Unvisited_nodes) of
        false -> 
            {{Current_weight,Current_name,Previous_node}, Unvisited_nodes_updated} = gb_sets:take_smallest(Unvisited_nodes),
            case dict:is_key(Current_name,Paths) of
                false ->
                    io:format("loop_through_nodes: Found a new smallest unvisited node ~w~n",[Current_name]),

                    Paths_updated = dict:store(Current_name,{Previous_node,Current_weight},Paths),
                    io:format("loop_through_nodes: Updated Paths: ~w~n",[Paths_updated]),

                    Out_edges = digraph:out_edges(Graph,Current_name),
                    io:format("loop_through_nodes: Ready to iterate through the out edges of node ~w: ~w~n",[Current_name,Out_edges]),

                    Unvisited_nodes_updated_2 = loop_through_edges(Graph,Out_edges,Paths_updated,Unvisited_nodes_updated,Current_weight),
                    io:format("loop_through_nodes: Looped through out edges of the node ~w and updated Unvisited nodes: ~w~n",[Current_name,Unvisited_nodes_updated_2]),

                    loop_through_nodes(Graph,Paths_updated,Unvisited_nodes_updated_2);
                    true ->
                    loop_through_nodes(Graph,Paths,Unvisited_nodes_updated)
            end;
        true -> 
            Paths
    end.

loop_through_edges(Graph,[],Paths,Unvisited_nodes,Current_weight) ->
                    io:format("loop_through_edges: No more out edges ~n"),
    Unvisited_nodes;

loop_through_edges(Graph,Edges,Paths,Unvisited_nodes,Current_weight) ->
                    io:format("loop_through_edges: Start ~n"),
    [Current_edge|Rest_edges] = Edges,
    {Current_edge,Current_node,Neighbour_node,Edge_weight} = digraph:edge(Graph,Current_edge),
    case dict:is_key(Neighbour_node,Paths) of
        false ->
                    io:format("loop_through_edges: Inserting new neighbour node ~w into Unvisited nodes~n",[Current_node]),
            Unvisited_nodes_updated = gb_sets:insert({Current_weight+Edge_weight,Neighbour_node,Current_node},Unvisited_nodes),
                    io:format("loop_through_edges: The unvisited nodes are: ~w~n",[Unvisited_nodes_updated]),
            loop_through_edges(Graph,Rest_edges,Paths,Unvisited_nodes_updated,Current_weight);
        true ->
            loop_through_edges(Graph,Rest_edges,Paths,Unvisited_nodes,Current_weight)
    end.
Spleen answered 17/7, 2011 at 16:40 Comment(0)
R
1

Your choice of data structures looks OK, so it is mostly a question of what to insert in them and how to keep them up to date. I'd suggest the following (I have changed the names a bit):

  • Result: A dict mapping Node to {Cost, Prev}, where Cost is the total cost of the path to Node and Prev is its predecessor on the path.

  • Open: A gb_sets structure of {Cost, Node, Prev}.

  • A graph with edges of the form {EdgeCost, NextNode}.

The result of the search is represented by Result and the graph isn't updated at all. There is no multiprocessing or message passing.

The algorithm goes as follows:

  • Insert {0, StartNode, Nil} in Open, where Nil is something that marks the end of the path.

  • Let {{Cost, Node, Prev}, Open1} = gb_sets:take_smallest(Open). If Node is already in Result then do nothing; otherwise add {Cost, Node, Prev} to Result, and for every edge {EdgeCost, NextNode} of Node add {Cost + EdgeCost, NextNode, Node} to Open1, but only if NextNode isn't already in Result. Continue with Open1 until the set is empty.

Dijkstra's algorithm asks for a decrease_key() operation on the Open set. Since this isn't readily supported by gb_sets we have used the workaround of inserting a tuple for NextNode even if NextNode might be present in Open already. That's why we check if the node extracted from Open is already in Result.


Extended discussion of the use of the priority queue

There are several ways of using a priority queue with Dijkstra's algorithm.

In the standard version of Wikipedia a node v is inserted only once but the position of v is updated when the cost and predecessor of v is changed.

alt := dist[u] + dist_between(u, v)
if alt < dist[v]:
    dist[v] := alt
    previous[v] := u
    decrease-key v in Q

Implementations often simplify by replacing decrease-key v in Q with add v to Q. This means that v can be added more than once, and the algorithm must therefore check that an u extracted from the queue hasn't already been added to the result.

In my version I am replacing the entire block above with add v to Q. The queue will therefore contain even more entries, but since they are always extracted in order it doesn't affect the correctness of the algorithm. If you don't want these extra entries, you can use a dictionary to keep track of the minimum cost for each node.

Reprint answered 17/7, 2011 at 20:33 Comment(6)
Thank you very much! Please, excuse me for my slow thinking, but I can not see the relax_neighbours function in your example (I mean the code that compares the weight of the adjacent node (the neighbour) with the sum of weights of the current node and of the edge between those nodes. And could you please explain why do we need the decrease_key() operation? P.S. (I use the Dijkstra's algorithm description from here: renaud.waldura.com/doc/java/dijkstra)Spleen
@Martin I've added a short discussion of these issues.Reprint
Thank you! I have updated my question with the code based on your recommendations. Unfortunately, it is slower than my previous version of code (it takes more than 5 seconds to compute paths in a graph of 1000 nodes; the previous version has taken 1 second on the same graph). Maybe I'm doing something wrong in this implementation?Spleen
@Martin Except for all the IO - which I hope you wasn't executed during testing - the program looks fine, and I think it is correct. Profiling and performance tuning is beyond the scope of my answer, which was to give the structure for the implementation of Dijkstra's algorithm in a functional language such as Erlang.Reprint
You are right. Thank you for your thorough answer. P.S. "Except for all the IO - which I hope you wasn't executed during testing" - what does that mean?Spleen
@Martin Erase the 'you' and the sentence is a little clearer. What I meant was that all those io:format() calls are very slow to execute, and therefore they should be removed from the code before any performance testing is done.Reprint

© 2022 - 2024 — McMap. All rights reserved.