First, be aware that Objective Caml does support a programming style which, despite syntactic differences, is fairly similar to C++, by means of mutable data structures (references, arrays, hash tables) and imperative constructs (for and while loops, variable assignment). I'm assuming below that you're actually trying to write your topological sort in idiomatic pure functional style.
Pure functional programming is mostly declarative : this function applied to that value is this other value. This is why the right-hand side of let x =
is an expression (evaluates to a value) instead of a sequence of actions that uses return
. Of course, trouble appears when adapting an algorithm that is commonly described as a series of steps instead.
Fortunately, there's a pattern (actually, a family of patterns) that lets you represent imperative-ish algorithms in functional style by turning "change the value of X" into "return a new object which is identical to the old one, except for the value of X".
A traditional DFS algorithm involves stepping through the graph while remembering which elements have already been visited - in general (so that you don't visit them more than once) and to get to your current position (so that you can detect cycles). So, the imperative state of a DFS algorithm is comprised of the current node, the set of visited nodes and the set of nodes in the current path. All this data will have to be provided to the recursive calls, and any permanent changes will have to be returned by those same recursive calls.
Using your graph structure from above, combined with a list representation for the two sets (it's hardly optimal - Set
would be a better choice here), the algorithm would look somewhat like this:
let dfs graph start_node =
let rec explore path visited node =
if List.mem node path then raise (CycleFound path) else
if List.mem node visited then visited else
let new_path = node :: path in
let edges = List.assoc node graph in
let visited = List.fold_left (explore new_path) visited edges in
node :: visited
in explore [] [] start_node
Three important details above: first, DFS says that one you are done exploring all the children of a given node, you should remove that node from the "current path" list and put it into the "visited" list. Since we're using immutable data structures, the first step is unnecessary: its only purpose was to undo the insertion of the node when the exploration started, and in our version the list new_path
is not modified by the recursive call to explore
. This is an example of case where functional data structures are more comfortable to work with than imperative structures.
Another important detail is the use of List.fold_left
. When we started making the state explicit, we replaced implicit imperative functions of type -> unit
with explicit functions of type -> state -> .. -> state
(accept the state as parameter, return new state). So, the imperative list exploration, which looked like this:
f edge_1 ; f edge_2 ; f edge_3
Now looks like this:
let state = f (f (f state edge_1) edge_2) edge_3)
Which is exactly what List.fold_left f state [edge_1 ; edge_2 ; edge_3]
does. Tooting my own horn, but I have a nice article about this here.
The third point is that the "add element to set" operation, when using lists to represent sets, is written simply as element :: set
, because this is an operation that returns a new set (list) which contains all the elements of the original set along with the new element. This leaves the original set untouched (which is good for the reasons described in step one) while using a constant amount of memory (it creates a cons cell - a simple head-tail pair containing a reference to the element and a reference to the set) : not only do you get undo capabilities, but you do so at no additional cost.
The above algorithm "inserts" nodes into visited
starting with the leaves of the DFS exploration, which in your case represent those nodes that should be done last. In short, the returned list is topologically sorted - but might not contain all elements because the starting point might not be the only root element (or even be a root element at all). So, there's an additional processing step involved here which consists in taking another node from the graph until all the graph has been explored.
Or, in other words, starting a new DFS exploration from every node in the graph, but ignoring any nodes previously explored - which is equivalent to keeping the list of visited elements from one DFS exploration to the next.
Using a small tweak to our previous algorithm, this takes only two lines:
let dfs graph visited start_node =
let rec explore path visited node =
if List.mem node path then raise (CycleFound path) else
if List.mem node visited then visited else
let new_path = node :: path in
let edges = List.assoc node graph in
let visited = List.fold_left (explore new_path) visited edges in
node :: visited
in explore [] visited start_node
let toposort graph =
List.fold_left (fun visited (node,_) -> dfs graph visited node) [] graph
The tweak consists in allowing the caller of dfs
to specify the list of already visited nodes. Carrying over the list of visited nodes while starting a DFS from every node is done using List.fold_left
exactly as before.
EDIT: aside from the type of the exception, there's nothing here that constrains the type of the elements in the graph. However, an exception cannot be polymorphic, so you have two possible solutions. The first is to give up on actually returning any data along with the exception:
exception CycleFound
... raise CycleFound ...
This will revert the type of toposort
back to a more generic ('a * ('a list)) list -> 'a list
.
The other solution is rather advanced OCaml : you need to make the module that contains the exception and the topological sort polymorphic in that specific type, as follows:
module type NODE = sig
type t
end
module type Topo = functor (Node:NODE) -> struct
exception CycleFound of Node.t list
let dfs ...
let sort ...
end
This would make the type of Topo(Node).sort
be (Node.t * (Node.t list)) list -> Node.t list
, which means you can sort any type you wish by defining a node module with that type:
type recipe = Eggs | Milk | Wheat | Mix | Cook | Serve
module Node = struct
type t = recipe
end
let graph = [ Wheat, [Eggs; Milk; Mix];
Milk, [Mix];
Eggs, [Mix];
Mix, [Cook];
Cook, [Serve];
Serve, [] ]
module RecipeTopo = Topo(Node)
let sorted = RecipeTopo.sort graph
let str_recipe = function
| Eggs -> "Eggs"
| Milk -> "Milk"
| Wheat -> "Wheat"
| Mix -> "Mix"
| Cook -> "Cook!"
| Serve -> "Serve"
let () = List.iter (fun i -> Printf.printf "%s " (str_recipe i)) sorted
// [Wheat; Milk; Eggs; Mix; Cook; Serve]
Try it out here.