Topological sort in OCaml
Asked Answered
M

4

21

I'm trying to write topological sorting in ocaml, but I'm a beginner (in OCaml & graphs algorithms) and I can't do this by myself.

It's easier for me to think about topological sorting in, for example, C++ (and there is a lot examples of topological sorting in C++ on the Internet), but I want to learn something new. Moreover, I've found some examples of topological sorting written in OCaml, but I don't understand them, to be frankly.

Let's say I have a list (int * int list) list, for example:

myList = [(1, [2]); (5, [6; 7]); (3, [2]); (6, [3; 7]); (8, [7]); (4, [3; 1])];;

and that means, that I need to "do" task 1 before task 2, task 4 before tasks 3 and 1 etc.

I think, that output for this list should be:

[8; 5; 6; 7; 4; 3; 1; 2]

(however I'm not sure, because I just made this example, so if I'm wrong, correct me please)

Also, I've read, that topological sort doesn't work for cycles in graphs, so there must be some kind of condition for cycles - when given graph has cycle(s) we raise exception (I think that is a good idea).

AFAIK, I need to use DFS in algorithm for topological sort, which (DFS) I don't know how to implement in OCaml (I understand main idea, but I don't feel, how that works in OCaml/functional programming).

I would really appreciate Your help to understand me this concepts (I mean topological sort, DFS in OCaml/functional programming). If You can, it would be great, if You show me example codes, because reading code is (for me) the best method to understand algorithm's concept.

I know, that for most of You this is a simple question, but I hope, that it won't keep You from helping me.

PS: I'm learning OCaml by myself (I'm in a high school), so I don't have solid theory background (either in OCaml or algorithms). I had started learning OCaml, because I wanted to understand recursion concept, and now this language seems to be interesting, because is really different from C++, so I'm still trying to learn something new in OCaml.

Memorial answered 11/1, 2011 at 3:27 Comment(1)
If your question had been phrased differently, I would have recommended you just use ocamlgraph.lri.fr . Nevertheless, when you are more comfortable with OCaml and in particular OCaml's module system, I recommend you re-visit this example. A nice introduction to Ocamlgraph is at alexleighton.wordpress.com/2009/04/22/intro-to-ocamlgraph .Nicolle
R
21

First, be aware that Objective Caml does support a programming style which, despite syntactic differences, is fairly similar to C++, by means of mutable data structures (references, arrays, hash tables) and imperative constructs (for and while loops, variable assignment). I'm assuming below that you're actually trying to write your topological sort in idiomatic pure functional style.

Pure functional programming is mostly declarative : this function applied to that value is this other value. This is why the right-hand side of let x = is an expression (evaluates to a value) instead of a sequence of actions that uses return. Of course, trouble appears when adapting an algorithm that is commonly described as a series of steps instead.

Fortunately, there's a pattern (actually, a family of patterns) that lets you represent imperative-ish algorithms in functional style by turning "change the value of X" into "return a new object which is identical to the old one, except for the value of X".

A traditional DFS algorithm involves stepping through the graph while remembering which elements have already been visited - in general (so that you don't visit them more than once) and to get to your current position (so that you can detect cycles). So, the imperative state of a DFS algorithm is comprised of the current node, the set of visited nodes and the set of nodes in the current path. All this data will have to be provided to the recursive calls, and any permanent changes will have to be returned by those same recursive calls.

Using your graph structure from above, combined with a list representation for the two sets (it's hardly optimal - Set would be a better choice here), the algorithm would look somewhat like this:

let dfs graph start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] [] start_node
  

Three important details above: first, DFS says that one you are done exploring all the children of a given node, you should remove that node from the "current path" list and put it into the "visited" list. Since we're using immutable data structures, the first step is unnecessary: its only purpose was to undo the insertion of the node when the exploration started, and in our version the list new_path is not modified by the recursive call to explore. This is an example of case where functional data structures are more comfortable to work with than imperative structures.

Another important detail is the use of List.fold_left. When we started making the state explicit, we replaced implicit imperative functions of type -> unit with explicit functions of type -> state -> .. -> state (accept the state as parameter, return new state). So, the imperative list exploration, which looked like this:

f edge_1 ; f edge_2 ; f edge_3

Now looks like this:

let state = f (f (f state edge_1) edge_2) edge_3)

Which is exactly what List.fold_left f state [edge_1 ; edge_2 ; edge_3] does. Tooting my own horn, but I have a nice article about this here.

The third point is that the "add element to set" operation, when using lists to represent sets, is written simply as element :: set, because this is an operation that returns a new set (list) which contains all the elements of the original set along with the new element. This leaves the original set untouched (which is good for the reasons described in step one) while using a constant amount of memory (it creates a cons cell - a simple head-tail pair containing a reference to the element and a reference to the set) : not only do you get undo capabilities, but you do so at no additional cost.

The above algorithm "inserts" nodes into visited starting with the leaves of the DFS exploration, which in your case represent those nodes that should be done last. In short, the returned list is topologically sorted - but might not contain all elements because the starting point might not be the only root element (or even be a root element at all). So, there's an additional processing step involved here which consists in taking another node from the graph until all the graph has been explored.

Or, in other words, starting a new DFS exploration from every node in the graph, but ignoring any nodes previously explored - which is equivalent to keeping the list of visited elements from one DFS exploration to the next.

Using a small tweak to our previous algorithm, this takes only two lines:

let dfs graph visited start_node = 
  let rec explore path visited node = 
    if List.mem node path    then raise (CycleFound path) else
    if List.mem node visited then visited else     
      let new_path = node :: path in 
      let edges    = List.assoc node graph in
      let visited  = List.fold_left (explore new_path) visited edges in
      node :: visited
  in explore [] visited start_node

let toposort graph = 
  List.fold_left (fun visited (node,_) -> dfs graph visited node) [] graph

The tweak consists in allowing the caller of dfs to specify the list of already visited nodes. Carrying over the list of visited nodes while starting a DFS from every node is done using List.fold_left exactly as before.

EDIT: aside from the type of the exception, there's nothing here that constrains the type of the elements in the graph. However, an exception cannot be polymorphic, so you have two possible solutions. The first is to give up on actually returning any data along with the exception:

exception CycleFound

... raise CycleFound ...

This will revert the type of toposort back to a more generic ('a * ('a list)) list -> 'a list.

The other solution is rather advanced OCaml : you need to make the module that contains the exception and the topological sort polymorphic in that specific type, as follows:

module type NODE = sig
  type t 
end

module type Topo = functor (Node:NODE) -> struct 
  exception CycleFound of Node.t list      
  let dfs ...
  let sort ...  
end

This would make the type of Topo(Node).sort be (Node.t * (Node.t list)) list -> Node.t list, which means you can sort any type you wish by defining a node module with that type:

type recipe = Eggs | Milk | Wheat | Mix | Cook | Serve

module Node = struct 
  type t = recipe 
end

let graph = [ Wheat, [Eggs; Milk; Mix];
              Milk,  [Mix];
              Eggs,  [Mix];
              Mix,   [Cook];
              Cook,  [Serve];
              Serve, [] ]

module RecipeTopo = Topo(Node)

let sorted = RecipeTopo.sort graph

let str_recipe = function
| Eggs -> "Eggs"
| Milk -> "Milk"
| Wheat -> "Wheat"
| Mix -> "Mix"
| Cook -> "Cook!"
| Serve -> "Serve"

let () = List.iter (fun i -> Printf.printf "%s " (str_recipe i)) sorted
//  [Wheat; Milk; Eggs; Mix; Cook; Serve]

Try it out here.

Regain answered 11/1, 2011 at 9:2 Comment(6)
First question I have, is about exception CycleFound. I'm not sure, how should I define it, it seems to be kind of parametrized exception. I mean, I can write exception CycleFound of int list;;, but then topsort has signature: val toposort : (int * ('a * int list)) list -> int list = <fun> and I wonder, what should I write, to have ('a * ('a * 'a list)?Memorial
Ah, seems like a mistake had slipped in (let _, edges = instead of let edges =) which caused the types to go bonkers. The signature should be (int * (int list)) list -> int list now - and you should indeed define exception CycleFound of int list (the parameter is the cycle : a list of nodes).Regain
And if I wanted to sort for example strings and int using the same function, what should I do? I know, I can write couple of function, each for particular type, one for string, one for int etc. But it's not a good idea, I suppose. Therefore, I'm thinking, how to edit Your code, to have general topological sort, not only for int type.Memorial
The answer's too long to hold in a comment, so I edited the answer itself.Regain
OK, explanation is good, but function toposort doesn't work for example tests: when I've tried to sort: myList = [(1, [2]); (5, [6; 7]); (3, [2]); (6, [3; 7]); (8, [7]); (4, [3; 1])];; - toposort myList, I get: Exception: Not_found.. What should I change?Memorial
That exception was thrown by List.assoc. Several of your tasks are pre-requisites for task 7, but there is no task 7 defined - you're expected to define it as (7, []) . The alternative: wrap the List.assoc as try List.assoc ... with Not_found -> [].Regain
M
4

[In case you don't know the term, where I write DAG below I mean "directed acyclic graph", or a collection of from/to edges connecting vertices such that there are no cycles.]

One way to do it is to extend your partial order (your DAG structure) into a total order (so for every pair of distinct vertices u and v, either u is a successor of v or vice versa). Then you can sort your vertices into order: u comes before v if v is a successor of u.

You can construct your total order by starting with the empty graph and adding one edge at a time from your original DAG. That is, an item (u, [v1; v2; ...; vn]) in your original DAG corresponds to the edges (u, v1), (u, v2), ..., (u, vn). For each such edge (u, v), find the predecessors P of u and the successors S of v from your total order. Then add (p, s) to your total order for all p in P U {u} and s in S U {v}.

NOW! A faster way to do it is to find a root in your original DAG (i.e., a vertex with no predecessors) and do a depth first search from that root, ensuring you never visit the same vertex twice. Every time you backtrack in your traversal, you "output" the vertex you are leaving. This way you construct the topological sort of your DAG. If there are any vertices left over, lather rinse, and repeat until they're all done.

Mendenhall answered 11/1, 2011 at 6:23 Comment(3)
I was thinking about method You mentioned first, but the second idea is very interesting. Thank's ;)Memorial
To be exact, you don't even need to start your iterations with a root node - any node will work, because non-root nodes have to be output before root nodes anyway.Regain
@Victor - Ah, sorry: I meant to add that if you can't find a root then your graph contains a cycle.Mendenhall
L
-1

You should try DFS first, it's easier and rewarding.

Louislouisa answered 11/1, 2011 at 8:23 Comment(1)
Easier than what? First before what? I suppose rewarding is nice, though.Presidium
N
-3

I don't know OCaml,but there's a simple algorithm in Wikipedia accredited to Kahn which I have used successfully (transcribing to Python). It's fairly simple so perhaps you could translate it into OCaml. Here it is:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    output error message (graph has at least one cycle)
else 
    output message (proposed topologically sorted order: L)
None answered 11/1, 2011 at 7:9 Comment(5)
Yes, you can translate this verbatim into OCaml, but this style isn't really idiomatic. This version requires mutability and imperative loops, you basically end up writing C++ in a little different syntax. I'm sure the OP wanted something that works with immutable data structures and written in a more idiomatic, functional style.Inflate
@Juliet: Yes you are right. I should have read the OP's question more carefully, especially the second paragraph. But just as an aside, and being unfamiliar with functional languages, it does seem that a procedural approach using mutable data structures has the advantage of being understandable to the average programmer. And in the real world isn't that really more important than mathematical provability? As a mental exercise however, I do agree that pure functional programming is fascinating and educational.Backflow
"it does seem that a procedural approach using mutable data structures has the advantage of being understandable to the average programmer." That is pretty naive.Flabbergast
@Dan O'Donnel: regarding the "procedural approach ... being understandable to the average programmer", I'd call that the Blub Paradox. I'd compare it to querying a database with a cursor, looping over rows by hand, writing if-statements to filter out rows. It matches the familiar imperative style, and you can build enterprisey applications like this. Its just that cursors are a "last resort", they aren't idiomatic and they make for very ugly and clumsy SQL. Same with imperative programming style in OCaml: its a last resort only when functional idioms can't solve the problem reasonably well.Inflate
@Juliet: Thanks for your thoughtful comment. I meant no disrespect for functional languages, I know what a powerful paradigm it is, and have used functional concepts to great advantage in my Python programming. My reference to "the average programmer" here was based on the current TIOBE ratings which shows Java, C, C++, PHP and Python in the top 5 places, ML at position 38, and OCaml & F# somewhere below the top 50. But since this was a question about OCaml, directed toward OCaml programmers, I realize now that I was out of line for suggesting a procedural approach.Backflow

© 2022 - 2024 — McMap. All rights reserved.