This is the fourth part in a series of educational regex articles. It show how the combination of nested reference (see: How does this regex find triangular numbers?) to "count" within assertions (see: How can we match a^n b^n with Java regex?) can be used to reverse a string. The programmatically generated pattern uses meta-pattern abstractions (see: How does this Java regex detect palindromes?). For the first time in the series, these techniques are used for replacement instead of whole string matching.
Complete working Java and C# implementations are provided. Inspirational quotes included.
Reversing a string using regular expressions never seemed like a good idea, nor was it even immediately obvious if it was at all possible, and if so, how one might attempt to do so.
While it's still not a good idea, at least now we know that it's possible, because here's one way to do it:
C# (also on ideone.com)
using System;
using System.Text.RegularExpressions;
public class TwoDollarReversal {
public static void Main() {
string REVERSE =
@"(?sx) . grab$2"
.Replace("grab$2",
ForEachDotBehind(
AssertSuffix(@"((.) \1?)")
)
);
Console.WriteLine(
Regex.Replace(
@"nietsniE treblA --
hguone llew ti dnatsrednu t'nod uoy ,ylpmis ti nialpxe t'nac uoy fI",
REVERSE, "$2"
)
);
// If you can't explain it simply, you don't understand it well enough
// -- Albert Einstein
}
// performs an assertion for each dot behind current position
static string ForEachDotBehind(string assertion) {
return "(?<=(?:.assertion)*)".Replace("assertion", assertion);
}
// asserts that the suffix of the string matches a given pattern
static string AssertSuffix(string pattern) {
return "(?=.*$(?<=pattern))".Replace("pattern", pattern);
}
}
Java (also on ideone.com)
class TwoDollarReversal {
public static void main(String[] args) {
String REVERSE =
"(?sx) . grab$2"
.replace("grab$2",
forEachDotBehind(
assertSuffix("((.) \\1?)")
)
);
System.out.println(
"taerG eht rednaxelA --\nyrt lliw ohw mih ot elbissopmi gnihton si erehT"
.replaceAll(REVERSE, "$2")
);
// There is nothing impossible to him who will try
// -- Alexander the Great"
}
static String forEachDotBehind(String assertion) {
return "(?<=^(?:.assertion)*?)".replace("assertion", assertion);
}
static String assertSuffix(String pattern) {
return "(?<=(?=^.*?pattern$).*)".replace("pattern", pattern);
}
}
Both the C# and Java versions seem to use the same overall algorithm, with minor variations only in the abstracted implementation details.
Clearly this is not the best, most straightforward, most efficient way to reverse a string. That said, in the interest of learning about regex; how to conceptualize patterns; how the engine works to match them; how to put various parts together to build what we want; how to do so in a way that is readable and maintainable; and just for the sheer joy of learning something new, can we have an explanation of how this works?
Appendix: Cheat sheet!
This is a brief description of the basic regex constructs used:
(?sx)
is the embedded flag modifiers.s
enables the "single-line" mode, allowing the dot to match ANY character (including newlines).x
enables the free-spacing mode, where unescaped whitespaces are ignored (and#
can be used for comments).^
and$
are the beginning and end-of-the-line anchors.?
as a repetition specifier denotes optional (i.e. zero-or-one of). As a repetition quantifier in e.g..*?
it denotes that the*
(i.e. zero-or-more of) repetition is reluctant/non-greedy.(…)
are used for grouping.(?:…)
is a non-capturing group. A capturing group saves the string it matches; it allows back/forward/nested references (e.g.\1
), replacement substitution (e.g.$2
), etc.(?=…)
is a positive lookahead; it looks to the right to assert that there's a match of the given pattern.(?<=…)
is a positive lookbehind; it looks to the left.
(?sx) . (?<=(?:.(?=.*$(?<=((.) \1?))))*)
– Karrykarst