How to find the lexicographically smallest string by reversing a substring?

Asked 15/9, 2017 at 16:30 Answered 15/9, 2017 at 19:22

I have a string S which consists of a's and b's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.

Operation: Reverse exactly one substring of S

e.g.

if S = abab then Output = aabb (reverse ba of string S)
if S = abba then Output = aabb (reverse bba of string S)

My approach

Case 1: If all characters of the input string are same then output will be the string itself.

Case 2: if S is of the form aaaaaaa....bbbbbb.... then answer will be S itself.

otherwise: Find the first occurence of b in S say the position is i. String S will look like

aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
     |
     i

In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.

aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
     |           |               |               |
     i           j               j               j

Reverse substring S[i:j] for every j and find the smallest string. The complexity of the algorithm will be O(|S|*|S|) where |S| is the length of the string.

Is there a better way to solve this problem? Probably O(|S|) solution.

What I am thinking if we can pick the correct j in linear time then we are done. We will pick that j where number of a's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.

Paramilitary answered 15/9, 2017 at 16:30 Comment(12)

Isn't the answer is always to write all 'a' first then all 'b'? – Wednesday 15/9, 2017 at 16:34

@AbdenaceurLichiheb you can perform the operation exactly once. It's not necessary that you can bring all a's in the beginning and all b's at the end always. e.g. S = ababba output=aabbab – Paramilitary 15/9, 2017 at 16:34

yes I know, imagine string abababa answer is aaaabbb, you only need to count the number of 'a' and 'b' and then write all 'a' then all 'b'. – Wednesday 15/9, 2017 at 16:36

I don't think there is a linear algorithm, there should be a linear-logarithmic algorithm though. – Cantone 15/9, 2017 at 16:58

@AbdenaceurLichiheb: Which one substring did you reverse to go from abababa to aaaabbb? – Isia 15/9, 2017 at 18:28

@JaredGoguen I think I found an O(n) algorithm. (At least for strings of reasonable length; for longer strings it would require BigIntegers.) – Littleton 15/9, 2017 at 22:27

@m69 I saw, it's beautiful – Cantone 15/9, 2017 at 22:46

@JaredGoguen Thanks; I didn't know the upvote was yours :-) – Littleton 15/9, 2017 at 22:47

Where does this problem come from? I'm pretty sure I have an O(n) solution, not restricted to binary strings, but I'd like to know more about the context. – Rosellaroselle 19/9, 2017 at 16:22

@Rosellaroselle my curiosity is killing me – Summers 19/9, 2017 at 19:46

@גלעדברקן: I understand, but I still have doubts about my solution. Although it passes all the tests I could think of, my attempt to prove it correct revealed the possibility of an error which could cause it to be O(n log n) instead of O(n). However, I can't trigger the error. Anyway, the draft answer I wrote last night got lost when my laptop crashed, so it will have to wait for later. Sorry. – Rosellaroselle 20/9, 2017 at 16:25

@Rosellaroselle I'm also looking forward to it. Btw, it seems like a self contained coding exercise to me; I don't think there is any real-world context. – Littleton 21/9, 2017 at 13:8

So, I came up with an algorithm, that seems to be more efficient that O(|S|^2), but I'm not quite sure of it's complexity. Here's a rough outline:

Strip of the leading a's, storing in variable start.
Group the rest of the string into letter chunks.
Find the indices of the groups with the longest sequences of a's.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of b's after reversal is at a minimum.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of a's (not including the leading a's) after reversal is at a minimum.
If only one index remains, proceed to 10.
Go back to 5, except inspect the [second/third/...] groups of a's and b's this time.
Return start, plus the reversed groups up to index, plus the remaining groups.

Since any substring that is being reversed begins with a b and ends in an a, no two hypothesized reversals are palindromes and thus two reversals will not result in the same output, guaranteeing that there is a unique optimal solution and that the algorithm will terminate.

My intuition says this approach of probably O(log(|S|)*|S|), but I'm not too sure. An example implementation (not a very good one albeit) in Python is provided below.

from itertools import groupby

def get_next_bs(i, groups, off):
    d = 1 + 2*off
    before_bs = len(groups[i-d]) if i >= d else 0
    after_bs = len(groups[i+d]) if i <= d and len(groups) > i + d else 0
    return before_bs + after_bs

def get_next_as(i, groups, off):
    d = 2*(off + 1)
    return len(groups[d+1]) if i < d else len(groups[i-d])

def maximal_reversal(s):
    # example input: 'aabaababbaababbaabbbaa'

    first_b = s.find('b')
    start, rest = s[:first_b], s[first_b:] 
    # 'aa', 'baababbaababbaabbbaa'

    groups = [''.join(g) for _, g in groupby(rest)]
    # ['b', 'aa', 'b', 'a', 'bb', 'aa', 'b', 'a', 'bb', 'aa', 'bbb', 'aa']

    try:
        max_length = max(len(g) for g in groups if g[0] == 'a')
    except ValueError:
        return s # no a's after the start, no reversal needed

    indices = [i for i, g in enumerate(groups) if g[0] == 'a' and len(g) == max_length]
    # [1, 5, 9, 11]

    off = 0
    while len(indices) > 1:
        min_bs = min(get_next_bs(i, groups, off) for i in indices)
        indices = [i for i in indices if get_next_bs(i, groups, off) == min_bs]
        # off 0: [1, 5, 9], off 1: [5, 9], off 2: [9]

        if len(indices) == 1:
            break

        max_as = max(get_next_as(i, groups, off) for i in indices)
        indices = [i for i in indices if get_next_as(i, groups, off) == max_as]
        # off 0: [1, 5, 9], off 1: [5, 9]

        off += 1

    i = indices[0]
    groups[:i+1] = groups[:i+1][::-1]

    return start + ''.join(groups)
    # 'aaaabbabaabbabaabbbbaa'

Cantone answered 15/9, 2017 at 16:38 Comment(2)

What if there are multiple occurences of such substring? Which one to choose? – Paramilitary 15/9, 2017 at 16:43

@Paramilitary Missed that point, I've written a new algorithm that addresses this issue. – Cantone 15/9, 2017 at 18:24

TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:

Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.

If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.

If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.

So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.

Here's a quick code example; it could undoubtedly be coded more elegantly:

function reverseSubsequence(str) {
    var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
    
    for (var pos = 0; pos < str.length; pos++) {
        var digit = str.charCodeAt(pos) - 97;                   // read next digit
        if (digit == 0) {
            if (first == undefined) continue;                   // skip leading zeros
            if (++len > max || len == max && reverse < value) { // better endpoint found
                max = len;
                last = pos;
                value = reverse;
            }
        } else {
            if (first == undefined) first = pos;                // end of leading zeros
            len = 0;
        }
        reverse += unit * digit;                                // update reverse value
        unit <<= 1;
        value = value * 2 + digit;                              // update endpoint value
    }
    return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "&rarr;" + result.to);

(The code could be simplified by comparing reverse and value whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)

You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.

Consider this string as an example:

aaabbaabaaabbabaaabaaab

or, written with zeros and ones for simplicity:

00011001000110100010001

We iterate over the leading zeros until we find the first one:

0001
   ^

This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:

reverse = 1, unit = 1

Then at every step, we double the unit and update the reverse number:

0001        reverse = 1
00011       unit = 2; reverse = 1 + 1 * 2 = 3
000110      unit = 4; reverse = 3 + 0 * 4 = 3
0001100     unit = 8; reverse = 3 + 0 * 8 = 3

At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:

endpoint = {position = 6, value = 3}

Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:

00011001      unit = 16; reverse = 3 + 1 * 16 = 19
              endpoint.value *= 2 + 1 = 7
000110010     unit = 32; reverse = 19 + 0 * 32 = 19
              endpoint.value *= 2 + 0 = 14
0001100100    unit = 64; reverse = 19 + 0 * 64 = 19
              endpoint.value *= 2 + 0 = 28
00011001000   unit = 128; reverse = 19 + 0 * 128 = 19
              endpoint.value *= 2 + 0 = 56

At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:

endpoint = {position = 10, value = 19}

And then we go on iterating over the string:

000110010001         unit = 256; reverse = 19 + 1 * 256 = 275
                     endpoint.value *= 2 + 1 = 39
0001100100011        unit = 512; reverse = 275 + 1 * 512 = 778
                     endpoint.value *= 2 + 1 = 79
00011001000110       unit = 1024; reverse = 778 + 0 * 1024 = 778
                     endpoint.value *= 2 + 0 = 158
000110010001101      unit = 2048; reverse = 778 + 1 * 2048 = 2826
                     endpoint.value *= 2 + 1 = 317
0001100100011010     unit = 4096; reverse = 2826 + 0 * 4096 = 2826
                     endpoint.value *= 2 + 0 = 634
00011001000110100    unit = 8192; reverse = 2826 + 0 * 8192 = 2826
                     endpoint.value *= 2 + 0 = 1268
000110010001101000   unit = 16384; reverse = 2826 + 0 * 16384 = 2826
                     endpoint.value *= 2 + 0 = 2536

Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:

endpoint.value = 2536  < reverse = 2826

so we keep the end-point set to position 10 and we go on iterating over the string:

0001100100011010001      unit = 32768; reverse = 2826 + 1 * 32768 = 35594
                         endpoint.value *= 2 + 1 = 5073  
00011001000110100010     unit = 65536; reverse = 35594 + 0 * 65536 = 35594
                         endpoint.value *= 2 + 0 = 10146
000110010001101000100    unit = 131072; reverse = 35594 + 0 * 131072 = 35594
                         endpoint.value *= 2 + 0 = 20292
0001100100011010001000   unit = 262144; reverse = 35594 + 0 * 262144 = 35594
                         endpoint.value *= 2 + 0 = 40584

And we find another sequence of 3 zeros, so we compare this position to the stored end-point:

endpoint.value = 40584 > reverse = 35594

and we find it has a smaller value, so we replace the possible end-point with the current position:

endpoint = {position = 21, value = 35594}

And then we iterate over the final digit:

00011001000110100010001   unit = 524288; reverse = 35594 + 1 * 524288 = 559882  
                          endpoint.value *= 2 + 1 = 71189

So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:

00011001000110100010001  ->  00000010001011000100111
   ^                 ^
start = 3         end = 21

Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.

#include <vector>

struct range {unsigned int first; unsigned int last;};

range lexiLeastRev(std::string const &str) {
    unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
    std::vector<bool> forward(0), reverse(0);
    bool leading_zeros = true;

    for (unsigned int pos = 0; pos < len; pos++) {
        bool digit = str[pos] - 'a';
        if (!digit) {
            if (leading_zeros) continue;
            if (++run > max_run || run == max_run && reverse < forward) {
                max_run = run;
                last = pos;
                forward = reverse;
            }
        }
        else {
            if (leading_zeros) {
                leading_zeros = false;
                first = pos;
            }
            run = 0;
        }
        forward.push_back(digit);
        reverse.insert(reverse.begin(), digit);
    }
    return range {first, last};
}

Obliteration answered 15/9, 2017 at 19:22 Comment(13)

How can this be O(n) when the length of numbers you're manipulating in general depends on n? – Fixer 16/9, 2017 at 9:0

I don't understand the logic for the point 3 i.e. comparison of endpoint.value and reverse. Why it will work? Apart from that everything is crystal clear. – Paramilitary 16/9, 2017 at 9:29

@Fixer You're probably right that it isn't O(n) in the strictest mathematical sense; let's say it's linear within the integer range of a particular implementation. – Littleton 16/9, 2017 at 12:40

@Paramilitary At any point, reverse holds the value of the string as if it were reversed up to the current position, whereas the value of the provisionally optimal endpoint holds the value of the string if it were reversed only up to that endpoint. If reversed is smaller, it means the string reversed up to the current point is lexicographically smaller than the string with only the part up to endpoint reversed, and so the current position becomes the new optimal endpoint. – Littleton 16/9, 2017 at 12:46

If the number of bits in each comparison is dependent on n, it would be linear if the number of comparisons was a constant. However, in this case the number of comparisons depends on n. Still it's a great answer. – Summers 16/9, 2017 at 13:31

@Paramilitary We can compare reverse and value at any point, without having parsed the whole string; if you stored and updated both endpoints and waited until the end of the string to compare them, you'd notice that they were both multiplied by and added to the same numbers, so if one is smaller than the other, that won't change while parsing the rest of the string. – Littleton 16/9, 2017 at 13:32

@גלעדברקן Hi again! (Btw, I also couldn't make sense of Rici's answer, but I was afraid to ask :-) I guess you could say it's linear within a certain bit range; i.e. if the string is 20 characters, it takes 20 steps (and a maximum of 10 possible endpoints), if it is 30 characters, there are 30 steps (and a maximum of 15 possible endpoints). If you go to a length that requires a larger integer size or BigIntegers, then of course there's a jump in complexity. How would you formally notate the complexity of this? – Littleton 16/9, 2017 at 13:40

It sounds like you're saying it's linear if n is restricted. But the whole idea of complexity is to take the influence of a varying n (the input length) into account. – Summers 16/9, 2017 at 13:51

Yes, I know. I'm not all that familiar with the mathematical details of the O() notation, maybe I should refrain from using it. I guess the point of my answer is that you only need to iterate over the string once to get a result, even with several runs of zeros with the same length, which is what the asker was looking for. I've changed my assessment of the complexity to "O(n)-ish for limited string sizes" for the time being. – Littleton 16/9, 2017 at 14:3

I think I figured out rici's idea - clearly the start of the reversal is the first b from the left, call it b_1. And what we need to reverse is the least suffix of the-suffix-starting-at-b_1-reversed. – Summers 16/9, 2017 at 14:12

@גלעדברקן Oh, nice. Still, I hope he expands his answer, because it's quite cryptic as it is now. – Littleton 16/9, 2017 at 14:20

@גלעדברקן Does the problem that Rici discovered with your algorithm mean that you can have more than two candidates at once? I originally started writing my answer as a method which required keeping track of several candidates, and deciding at the end which was best; I only realized that it could be decided immediately while working through the example. – Littleton 17/9, 2017 at 21:26

Yes, it could not efficiently deal with repetitive cycles in the string. – Summers 17/9, 2017 at 22:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags