Topological sort based on a comparator (rather than a graph)
Asked Answered
S

1

7

I have a set of items and a comparator function which defines a partial ordering -- given two items, it returns "=", "<", ">", or "no defined ordering" (say "<>"). I want to produce a sorted list of items that respects this partial ordering.

If I look for algorithms to do a topological sort, they generally start with a directed acyclic graph. But I don't have a DAG, and I can't see a simple way to construct one without doing a large number (N*N perhaps?) of comparisons. What I would like is some kind of QuickSort-like algorithm that works by comparing and swapping selected items in a list. Is there such an algorithm? I'm assuming most classical sorting algorithms would fail because of the indeterminism.

I thought of trying to use a classical sort algorithm and treating "<>" as "=", but it doesn't work because I can have the situation A < B, A <> C, B <> C, so I can't treat C as being equal to both A and B.

Any ideas or pointers?

Sonni answered 29/1, 2020 at 11:11 Comment(10)
Is the comparator function transitive with respect to < and >, or is it the transitive closure which defines the partial ordering? (Also, hello Michael!)Rentsch
The comparator is transitive (at least, if it isn't, it's broken!).Sonni
I think the search term you're looking for is "poset sorting", where poset = "partially-ordered set". There's a paper from 2009 with some algorithms, the most efficient one seems to be based on merge sort. The running time depends on the poset width, which is the maximum size of an "antichain" (a subset of mutually-incomparable elements); in the worst case, there might be only one pair of comparable elements, in which case you do have to do N*N comparisons to find which pair it is and put them in the right relative order.Rentsch
The fact that the comparator can return = in some cases makes it slightly different from regular poset sorting, but it's probably possible to adapt to that, perhaps by using a disjoint-set data structure to group equal elements together when you find them, and just sort the groups. I'd have to think a bit more about the details of that, though.Rentsch
@Rentsch Shouldn't it be the same? One would say that < means "less than" and anything else, >, = or <>, means "not less than" (since, from the poset definition, a and b are considered incomparable simply if a is not less than b and b is not less than a).Horsehair
@jdehesa You could treat = as meaning the same as <>, but that means throwing away some information, and hence losing some potential efficiency. For example, if the comparator always returns = then you only have to do O(n) comparisons to know that any ordering is fine; if the comparator always returns <> then you have to do O(n^2) comparisons to confirm that any ordering is fine. A poset can indeed be defined just using <, but what that means is that this comparator (which can return =) defines something that isn't exactly a poset.Rentsch
@Rentsch Ah I understand, you meant algorithms could be adapted to take advantage of the richer comparison semantics for efficiency, not because they could not possibly work as such, yes that makes sense.Horsehair
Thanks for all the input. I think the "=" case can be handled fairly easily by coalescing items into groups when the duplicates are found.Sonni
@Rentsch Many thanks for the reference to the 2009 paper; we will study it carefully.Sonni
Knuth's topological sort algorithm from TAOCP does not use a graph - it is very fast and compact - a C implementation is described at https://mcmap.net/q/1625740/-how-do-you-implement-knuth-39-s-toposort-in-cLindsylindy
L
4

You don't need to create a graph explicitly to use topological sort algorithm.

Let S be the set of elements you want to sort, and there is a partial order in S. Let used be the dictionary that maps each element from S to a boolean value (false by default) that will be true when we visit a 'node' with this element. Let stack be the stack of elements from S (empty by default).

Define a method dfs(x) (x is an element of S) that does the following:

  • Set used[x] to true

  • For each element y in S:

    • If used[y] is false and x is less than or equal to y (*):

      • Call dfs(y)
  • Add x to stack

Then:

  • For each element x in S:

    • If used[x] is false:

      • Call dfs(x)

After this loop, elements in stack will be ordered (first item to be popped from stack is minimal one (not necessarily minimum), last item is maximal one (not necessarily maximum)).

This algorithm obviously runs in O(n^2) because it's still a topological sort, just without creating graph explicitly.

(*): Like topological sort processes only edges that go from x to y and does not process cases where edge goes from y to x or there is no edge at all, this algorithm processes only 'less than or equal to' relations.

Lillith answered 29/1, 2020 at 12:1 Comment(6)
Thanks for this. Compared with the algorithms given in the paper cited by @kaya3, it at least has the virtue of simplicity! But the O(n^2) scares me a bit. In most cases the sets we are sorting will only contain 10-20 items, but there will be some cases of 100-200 encountered.Sonni
We may have some scope for optimization by partitioning the items into classes where certain pairs of classes are known to be incomparable, so when processing an item we only need to search the classes whose items are comparable.Sonni
O(n^2) is really good for 100 elements - algorithm runs in few milliseconds. You may divide your 'graph' into connected components and work with them independently, but building them still runs in O(n^2), because your partial order is actually an adjacency matrix.Lillith
Just wrote this algorithm in Kotlin - list of integers from 1 to 100 with ordering relation x <= y if y % x == 0 was sorted in 4 ms, list of 1000 integers was sorted in 89 ms, and list of 10000 integers was sorted in 1717 ms.Lillith
@ardent Thanks. My comparator function may be fairly expensive to evaluate. But I think I can exploit the structure of my data; for example there are some classes of items where the class internally has a total order and everything in the class is "<" everything in some other class.Sonni
Accepting this answer; both this answer and the comments on the question provided useful insights. For the moment, in the light of these responses, I'm tackling the problem a different way, by trying to enhance my comparator to compute a total ordering; but I may come back to partial ordering if this proves unsatisfactory.Sonni

© 2022 - 2024 — McMap. All rights reserved.