TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:
- Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.
- If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.
- If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.
So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.
Here's a quick code example; it could undoubtedly be coded more elegantly:
function reverseSubsequence(str) {
var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
for (var pos = 0; pos < str.length; pos++) {
var digit = str.charCodeAt(pos) - 97; // read next digit
if (digit == 0) {
if (first == undefined) continue; // skip leading zeros
if (++len > max || len == max && reverse < value) { // better endpoint found
max = len;
last = pos;
value = reverse;
}
} else {
if (first == undefined) first = pos; // end of leading zeros
len = 0;
}
reverse += unit * digit; // update reverse value
unit <<= 1;
value = value * 2 + digit; // update endpoint value
}
return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "→" + result.to);
(The code could be simplified by comparing reverse
and value
whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)
You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.
Consider this string as an example:
aaabbaabaaabbabaaabaaab
or, written with zeros and ones for simplicity:
00011001000110100010001
We iterate over the leading zeros until we find the first one:
0001
^
This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:
reverse = 1, unit = 1
Then at every step, we double the unit and update the reverse number:
0001 reverse = 1
00011 unit = 2; reverse = 1 + 1 * 2 = 3
000110 unit = 4; reverse = 3 + 0 * 4 = 3
0001100 unit = 8; reverse = 3 + 0 * 8 = 3
At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:
endpoint = {position = 6, value = 3}
Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:
00011001 unit = 16; reverse = 3 + 1 * 16 = 19
endpoint.value *= 2 + 1 = 7
000110010 unit = 32; reverse = 19 + 0 * 32 = 19
endpoint.value *= 2 + 0 = 14
0001100100 unit = 64; reverse = 19 + 0 * 64 = 19
endpoint.value *= 2 + 0 = 28
00011001000 unit = 128; reverse = 19 + 0 * 128 = 19
endpoint.value *= 2 + 0 = 56
At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:
endpoint = {position = 10, value = 19}
And then we go on iterating over the string:
000110010001 unit = 256; reverse = 19 + 1 * 256 = 275
endpoint.value *= 2 + 1 = 39
0001100100011 unit = 512; reverse = 275 + 1 * 512 = 778
endpoint.value *= 2 + 1 = 79
00011001000110 unit = 1024; reverse = 778 + 0 * 1024 = 778
endpoint.value *= 2 + 0 = 158
000110010001101 unit = 2048; reverse = 778 + 1 * 2048 = 2826
endpoint.value *= 2 + 1 = 317
0001100100011010 unit = 4096; reverse = 2826 + 0 * 4096 = 2826
endpoint.value *= 2 + 0 = 634
00011001000110100 unit = 8192; reverse = 2826 + 0 * 8192 = 2826
endpoint.value *= 2 + 0 = 1268
000110010001101000 unit = 16384; reverse = 2826 + 0 * 16384 = 2826
endpoint.value *= 2 + 0 = 2536
Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:
endpoint.value = 2536 < reverse = 2826
so we keep the end-point set to position 10 and we go on iterating over the string:
0001100100011010001 unit = 32768; reverse = 2826 + 1 * 32768 = 35594
endpoint.value *= 2 + 1 = 5073
00011001000110100010 unit = 65536; reverse = 35594 + 0 * 65536 = 35594
endpoint.value *= 2 + 0 = 10146
000110010001101000100 unit = 131072; reverse = 35594 + 0 * 131072 = 35594
endpoint.value *= 2 + 0 = 20292
0001100100011010001000 unit = 262144; reverse = 35594 + 0 * 262144 = 35594
endpoint.value *= 2 + 0 = 40584
And we find another sequence of 3 zeros, so we compare this position to the stored end-point:
endpoint.value = 40584 > reverse = 35594
and we find it has a smaller value, so we replace the possible end-point with the current position:
endpoint = {position = 21, value = 35594}
And then we iterate over the final digit:
00011001000110100010001 unit = 524288; reverse = 35594 + 1 * 524288 = 559882
endpoint.value *= 2 + 1 = 71189
So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:
00011001000110100010001 -> 00000010001011000100111
^ ^
start = 3 end = 21
Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.
#include <vector>
struct range {unsigned int first; unsigned int last;};
range lexiLeastRev(std::string const &str) {
unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
std::vector<bool> forward(0), reverse(0);
bool leading_zeros = true;
for (unsigned int pos = 0; pos < len; pos++) {
bool digit = str[pos] - 'a';
if (!digit) {
if (leading_zeros) continue;
if (++run > max_run || run == max_run && reverse < forward) {
max_run = run;
last = pos;
forward = reverse;
}
}
else {
if (leading_zeros) {
leading_zeros = false;
first = pos;
}
run = 0;
}
forward.push_back(digit);
reverse.insert(reverse.begin(), digit);
}
return range {first, last};
}
S = ababba
output=aabbab
– Paramilitaryabababa
toaaaabbb
? – Isia