codingBat repeatEnd using regex
Asked Answered
O

3

3

I'm trying to understand regex as much as I can, so I came up with this regex-based solution to codingbat.com repeatEnd:

Given a string and an int N, return a string made of N repetitions of the last N characters of the string. You may assume that N is between 0 and the length of the string, inclusive.

public String repeatEnd(String str, int N) {
  return str.replaceAll(
    ".(?!.{N})(?=.*(?<=(.{N})))|."
      .replace("N", Integer.toString(N)),
    "$1"
  );
}

Explanation on its parts:

  • .(?!.{N}): asserts that the matched character is one of the last N characters, by making sure that there aren't N characters following it.
  • (?=.*(?<=(.{N}))): in which case, use lookforward to first go all the way to the end of the string, then a nested lookbehind to capture the last N characters into \1. Note that this assertion will always be true.

  • |.: if the first assertion failed (i.e. there are at least N characters ahead) then match the character anyway; \1 would be empty.

  • In either case, a character is always matched; replace it with \1.

My questions are:

  • Is this technique of nested assertions valid? (i.e. looking behind during a lookahead?)
  • Is there a simpler regex-based solution?

Bonus question

Do repeatBegin (as analogously defined).

I'm honestly having troubles with this one!

Osterhus answered 9/4, 2010 at 9:17 Comment(0)
C
3

Nice one! I don't see a way to significantly improve on that regex, although I would refactor it to avoid the needless use of negative logic:

".(?=.{N})|.(?=.*(?<=(.{N})))"

This way the second alternative is never entered until you reach the final N characters, which I think makes the intent a little clearer.

I've never seen a reference that says it's okay to nest lookarounds, but like Bart, I don't see why it wouldn't be. I sometimes use lookaheads inside lookbehinds to get around limitations on variable-length lookbehind expressions.


EDIT: I just realized I can simplify the regex quite a bit by putting the alternation inside the lookahead:

".(?=.{N}|.*(?<=(.{N})))"

By the way, have you considered using format() to build the regex instead of replace()?

return str.replaceAll(
  String.format(".(?=.{%1$d}|.*(?<=(.{%1$d})))", N),
  "$1"
);
Cassandracassandre answered 9/4, 2010 at 12:4 Comment(2)
+1! Good job! Do check out my answer, though: it uses only 2 assertions, ?= and ?<=. Also check out my other regex/codingbat question (wordEnds). It just got me a Tumbleweed.Osterhus
@Alan: I have used format, but I think I prefer replace because it makes the regex part more readable, and almost an independent entity; with format, you can't just pass around the regex without specifying the arguments to format since their order matters. You can also see multiple occurrences more easily (e.g. if there are multiple N, M). Using replace also puts the before->after i.e. var->value mapping side by side, which I prefer.Osterhus
B
1

Whoa, that's some scary regex voodoo there! : )

  • Is this technique of nested assertions valid? (i.e. looking behind during a lookahead?)

Yes, that is perfectly valid in most PCRE implementations I know of.

  • Is there a simpler regex-based solution?

I didn't spend too much time on it, but I don't quickly see how that could be simplified or shortened with a single regex replacement.

Biofeedback answered 9/4, 2010 at 9:33 Comment(3)
Any reference on your answer for part 1?Osterhus
No reference. I just know it works because I've tried and used it in the past. Why shouldn't it be legal? I wouldn't know where to find a reference that told be I could use a group inside another one, I just know I can.Biofeedback
Regarding your voodoo comment: to fully understand something, you must not fear it =)Osterhus
O
0

Is there a simpler regex-based solution?

It took me a while, but eventually I managed to simplify the regex to:

"(?=.{0,N}$(?<=(.{N}))).|." // repeatEnd
          -or-
".(?<=^(?=(.{N})).{0,N})|." // repeatBegin

Like Alan Moore's answer, this removes the negative assertion, but doesn't even replace it with a positive one, so it now only has 2 assertions instead of 3.

I also like the fact that the "else" case is just a simple .. I prefer to put the bulk of my regex into the "working" side of the alternation, and keep the "non-working" side as simple as possible (usually a simple . or .*).

Osterhus answered 9/4, 2010 at 12:22 Comment(3)
When I mentioned needless use of negative logic, I wasn't referring to the negative lookahead per se, but to the contortions you went through to get that lone . all by itself in the second alternative. That may seem clearer to you, but to my eye it has the opposite effect.Cassandracassandre
It's definitely a matter of personal preference. I'd always prefer if(!a && b) X else Y to if (a) Y else if (b) X when they're interchangeable (which they aren't always). In any case, I'd appreciate your input on repeatBegin I mentioned in the update to the question; I'm honestly having a hard time with it. I have a solution that works, but it's quite different from repeatEnd.Osterhus
"(?<=.{N}|^(?=(.{N})).{0,N})." seems to work. It's essentially the mirror image of my repeatEnd regex, except I had to use {0,N} instead of * to satisfy Java's "odious maximum length" requirement, plus ^ to force the lookbehind to go all the way back.Cassandracassandre

© 2022 - 2024 — McMap. All rights reserved.