How does Duval's algorithm handle odd-length strings?
Asked Answered
S

2

0

Finding the Lexicographically minimal string rotation is a well known problem, for which a linear time algorithm was proposed by Jean Pierre Duval in 1983. This blog post is probably the only publicly available resource that talks about the algorithm in detail. However, Duval's algorithms is based on the idea of pairwise comparisons ("duels"), and the blog conveniently uses an even-length string as an example.

How does the algorithm work for odd-length strings, where the last character wouldn't have a competing one to duel with?

Sciuroid answered 11/4, 2019 at 23:52 Comment(6)
I believe you just treat the string as wrapping back to the first char, so the last item is compared to the first.Ratliff
@Ratliff can you cite references to support your belief? a logical choice would be to simply let the lone char win.Sciuroid
Not really relevant to your question, but: the blog post says that it only requires constant memory space, but I don't see how that can be true; it seems like it would need at least O(log n) space to keep track of the winners who need to duel.Chap
@Chap wikipedia also claims constant memory. Unless, of course, the original array is mutated at each iteration, and the losers are set to null. If the goal is to only find the index, that could work.Sciuroid
@AbhijitSarkar: Setting the losers to null would completely break the algorithm, because even the losers get compared in subsequent duels. (Consider ABAC. You compare A to B and A to C; then you compare AB to AC. If you've nulled out B and C, then you can't do that comparison. Furthermore, using the array-elements themselves as storage would break the performance characteristics, because then you'd have to scan to find undefeated indices to duel against. So the algorithm would become O*(*n log n).)Chap
@Chap you're right that null would break the algorithm, and next iteration won't work, although I don't understand the scanning part. I'll have a better understanding after having implemented the algorithm.Sciuroid
C
1

One character can get a "bye", where it wins without participating in a "duel". The correctness of the algorithm does not rely on the specific duels that you perform; given any two distinct indices i and j, you can always conclusively rule out that one of them is the start-index of the lexicographically-minimal rotation (unless both are start-indices of identical lexicographically-minimal rotations, in which case it doesn't matter which one you reject). The reason to perform the duels in a specific order is performance: to get asymptotically linear time by ensuring that half the duels only need to compare one character, half of the rest only need to compare two characters, and so on, until the last duel only needs to compare half the length of the string. But a single odd character here and there doesn't change the asymptotic complexity, it just makes the math (and implementation) a little bit more complicated. A string of length 2n+1 still requires fewer "duels" than one of length 2n+1.

Chap answered 12/4, 2019 at 0:9 Comment(0)
S
1

OP here: I accepted ruakh's answer as it pertains to my question, but I wanted to provide my own explanation for others that might stumble across this post trying to understand Duval's algorithm.

Problem:

Lexicographically least circular substring is the problem of finding the rotation of a string possessing the lowest lexicographical order of all such rotations. For example, the lexicographically minimal rotation of "bbaaccaadd" would be "aaccaaddbb".

Solution:

A O(n) time algorithm was proposed by Jean Pierre Duval (1983).

Given two indices i and j, Duval's algorithm compares string segments of length j - i starting at i and j (called a "duel"). If index + j - i is greater than the length of the string, the segment is formed by wrapping around.

For example, consider s = "baabbaba", i = 5 and j = 7. Since j - i = 2, the first segment starting at i = 5 is "ab". The second segment starting at j = 7 is constructed by wrapping around, and is also "ab". If the strings are lexicographically equal, like in the above example, we choose the one starting at i as the winner, which is i = 5.

The above process repeated until we have a single winner. If the input string is of odd length, the last character wins without a comparison in the first iteration.

Time complexity:

The first iteration compares n strings each of length 1 (n/2 comparisons), the second iteration may compare n/2 strings of length 2 (n/2 comparisons), and so on, until the i-th iteration compares 2 strings of length n/2 (n/2 comparisons). Since the number of winners is halved each time, the height of the recursion tree is log(n), thus giving us a O(n log(n)) algorithm. For small n, this is approximately O(n).

Space complexity is O(n) too, since in the first iteration, we have to store n/2 winners, second iteration n/4 winners, and so on. (Wikipedia claims this algorithm uses constant space, I don't understand how).

Here's a Scala implementation; feel free to convert to your favorite programming language.

def lexicographicallyMinRotation(s: String): String = {
 @tailrec
 def duel(winners: Seq[Int]): String = {
   if (winners.size == 1) s"${s.slice(winners.head, s.length)}${s.take(winners.head)}"
   else {
     val newWinners: Seq[Int] = winners
       .sliding(2, 2)
       .map {
         case Seq(x, y) =>
           val range = y - x
           Seq(x, y)
             .map { i =>
               val segment = if (s.isDefinedAt(i + range - 1)) s.slice(i, i + range)
               else s"${s.slice(i, s.length)}${s.take(s.length - i)}"
               (i, segment)
             }
             .reduce((a, b) => if (a._2 <= b._2) a else b)
             ._1
         case xs => xs.head
       }
       .toSeq
     duel(newWinners)
   }
 }

 duel(s.indices)
}
Sciuroid answered 12/4, 2019 at 4:54 Comment(2)
The actual Duval's algorithm is more refined than the simple version described here. It does use constant space. Please check Lyndon factorization and Duval's algorithmSkinned
@Skinned It’s been severe years since I wrote this, so I can’t be certain, but I think I’ve seen the article you linked to. Links are not useful though; feel free to post your own answer with any possible improvements.Sciuroid

© 2022 - 2024 — McMap. All rights reserved.