This is topological sorting on a directed acyclic graph. You need to first build the graph: vertices are letters, and there's an edge if one is lexicographically less than the other. The topological order then gives you the answer.
A contradiction is when the directed graph is not acyclic. Uniqueness is determined by whether or not a Hamiltonian path exists, which is testable in polynomial time.
Building the graph
You do this by comparing each two consecutive "words" from the dictionary. Let's say you have these two words appearing one after another:
x156@
x1$#2z
Then you find the longest common prefix, x1
in this case, and check the immediately following characters after this prefix. In this case,, we have 5
and $
. Since the words appear in this order in the dictionary, we can determine that 5
must be lexicographically smaller than $
.
Similarly, given the following words (appearing one after another in the dictionary)
jhdsgf
19846
19846adlk
We can tell that 'j' < '1'
from the first pair (where the longest common prefix is the empty string). The second pair doesn't tell us anything useful (since one is a prefix of another, so there are no characters to compare).
Now suppose later we see the following:
oi1019823
oij(*#@&$
Then we've found a contradiction, because this pair says that '1' < 'j'
.
The topological sort
There are two traditional ways to do topological sorting. Algorithmically simpler is the depth-first search approach, where there's an edge from x
to y
if y < x
.
The pseudocode of the algorithm is given in Wikipedia:
L ← Empty list that will contain the sorted nodes
S ← Set of all nodes with no incoming edges
function visit(node n)
if n has not been visited yet then
mark n as visited
for each node m with an edge from n to m do
visit(m)
add n to L
for each node n in S do
visit(n)
Upon conclusion of the above algorithm, the list L
would contain the vertices in topological order.
Checking uniqueness
The following is a quote from Wikipedia:
If a topological sort has the property that all pairs of consecutive vertices in the sorted order are connected by edges, then these edges form a directed Hamiltonian path in the DAG. If a Hamiltonian path exists, the topological sort order is unique; no other order respects the edges of the path. Conversely, if a topological sort does not form a Hamiltonian path, the DAG will have two or more valid topological orderings, for in this case it is always possible to form a second valid ordering by swapping two consecutive vertices that are not connected by an edge to each other. Therefore, it is possible to test in polynomial time whether a unique ordering exists, and whether a Hamiltonian path exists.
Thus, to check if the order is unique or not, you simply check if all two consecutive vertices in L
(from the above algorithm) are connected by direct edges. If they are, then the order is unique.
Complexity analysis
Once the graph is built, topological sort is O(|V|+|E|)
. Uniqueness check is O(|V| edgeTest)
, where edgeTest
is the complexity of testing whether two vertices are connected by an edge. With an adjacency matrix, this is O(1)
.
Building the graph requires only a single linear scan of the dictionary. If there are W
words, then it's O(W cmp)
, where cmp
is the complexity of comparing two words. You always compare two subsequent words, so you can do all sorts of optimizations if necessary, but otherwise a naive comparison is O(L)
where L
is the length of the words.
You may also shortcircuit reading the dictionary once you've determined that you have enough information about the alphabet, etc, but even a naive building step would take O(WL)
, which is the size of the dictionary.
O(WL)
of them. – Ogive