Can you salvage my negative lookbehind example for commifying numbers?
Asked Answered
A

3

18

In the "Advanced Regular Expresssion" chapter in Mastering Perl, I have a broken example for which I can't figure out a nice fix. The example is perhaps trying to be too clever for its own good, but maybe someone can fix it for me. There could be a free copy of the book in it for working fixes. :)

In the section talking about lookarounds, I wanted to use a negative lookbehind to implement a commifying routine for numbers with fractional portions. The point was to use a negative lookbehind because that was the topic.

I stupidly did this:

$_ = '$1234.5678';
s/(?<!\.\d)(?<=\d)(?=(?:\d\d\d)+\b)/,/g;  # $1,234.5678

The (?<!\.\d) asserts that the bit before the (?=(?:\d\d\d)+\b) is not a decimal point and a digit.

The stupid thing is not trying hard enough to break it. By adding another digit to the end, there is now a group of three digits not preceded by a decimal point and a digit:

$_ = '$1234.56789';
s/(?<!\.\d)(?<=\d)(?=(?:\d\d\d)+\b)/,/g;  # $1,234.56,789

If lookbehinds could be variable width in Perl, this would have been really easy. But they can't.

Note that it's easy to do this without a negative lookbehind, but that's not the point of the example. Is there a way to salvage this example?

Absorber answered 24/2, 2010 at 23:22 Comment(5)
FYI, You left out the \b to (?:\d\d\d)+\b) when you posted this, so I added it. (I checked, and it is there in the book.) That's just a distraction, though; it has nothing to do with the lookbehind problem.Breslau
Ah, yes, thanks. When I copied and pasted that from my email to test it, something converted the \b to a ^B and messed everything up. I forget to re-add it.Absorber
You might want to consider putting a bounty on this question for extra motivation (although the book is very thoughtful!), since you have the rep to spare. :) It may also get you more eyes, as it's possible to search for questions with active bounties from the front page.Katinka
@FM: the point of the question is to use (?<!). I'm not looking for ways around it.Absorber
Surprised no one edits the title... Was LOLing at the wordingsMaleki
N
14

I don't think it's possible without some form of variable-width look-behind. The addition of the \K assertion in 5.10 provides a way of faking variable-width positive look-behind. What we really need is variable-width negative look-behind but with a little creativity and a lot of ugliness we can make it work:

use 5.010;
$_ = '$1234567890.123456789';
s/(?<!\.)(?:\b|\G)\d+?\K(?=(?:\d\d\d)+\b)/,/g;
say;  # $1,234,567,890.123456789

If there was ever a pattern that begged for the /x notation it's this one:

s/
  (?<!\.)        # Negative look-behind assertion; we don't want to match
                 # digits that come after the decimal point.

  (?:            # Begin a non-capturing group; the contents anchor the \d
                 # which follows so that the assertion above is applied at
                 # the correct position.

    \b           # Either a word boundary (the beginning of the number)...

    |            # or (because \b won't match at subsequent positions where
                 # a comma should go)...

    \G           # the position where the previous match left off.

  )              # End anchor grouping

  \d+?           # One or more digits, non-greedily so the match proceeds
                 # from left to right. A greedy match would proceed from
                 # right to left, the \G above wouldn't work, and only the
                 # rightmost comma would get placed.

  \K             # Keep the preceding stuff; used to fake variable-width
                 # look-behind

                 # <- This is what we match! (i.e. a position, no text)

  (?=            # Begin a positive look-ahead assertion

    (?:\d\d\d)+  # A multiple of three digits (3, 6, 9, etc.)

    \b           # A word (digit) boundary to anchor the triples at the
                 # end of the number.

  )              # End positive look-ahead assertion.
/,/xg;
Nectarine answered 25/2, 2010 at 21:3 Comment(4)
Brilliant. I never considered using \K since I wrote Mastering Perl pre-5.10. I think I can make this work, if only for a completely too-clever example where I can point out the extreme difficulty of variable width lookbehinds. The \G is quite the bonus too. That's a free book for you. If you already have Mastering Perl, tell me which other book I can get you. :)Absorber
As pleased as I am about finding a solution within the constraints of the problem I'm somewhat appalled at my creation, particularly the use of an alternation between zero-width assertions. I needed use re 'debug' to figure out that the \G was necessary. I did benchmark it just for fun and it's about 10% faster than the FAQ answers. That's probably because it doesn't use captures. I don't have a copy of Mastering Perl so that would be great. Hmm... there's no PM system here, but you should be able to reach me via my CPAN author ID (MJCARMAN).Nectarine
For what it's worth, I re-used this example in Mastering Perl, 2nd Edition, but not to show off lookaheads. I used to illustrate \K :)Absorber
@briandfoy: Cool! I'm glad to have helped.Nectarine
W
4

If you have to post on Stack Overflow asking if somebody can figure out how to do this with negative lookbehind, then it's obviously not a good example of negative lookbehind. You'd be better off thinking up a new example rather than trying to salvage this one.

In that spirit, how about an automatic spelling corrector?

s/(?<![Cc])ei/ie/g; # Put I before E except after C

(Obviously, that's not a hard and fast rule in English, but I think it's a more realistic application of negative lookbehind.)

Whodunit answered 25/2, 2010 at 0:10 Comment(4)
Yes, I think I'll have to abandon the example, which is too bad. I already have simpler examples like the one that you describe, though. However, I should also say that just because I can't figure it out doesn't mean it's not a good example. The best way to learn anything is to write a book on it. I have learned quite a bit from my technical reviewers. :)Absorber
I just noticed the italicized "you". I think there a many people much smarter and better at Perl than me on Stackoverflow. I'm just here a lot. :)Absorber
@Whodunit Can you please add examples of the spelling rule you are referring to? I did not know English even has spelling rules. .-) Also I think you mean "swap i and e" rather than "put i before e".Rag
@AloisMahdal, there are plenty of examples on Wikipedia.Whodunit
G
0

I don't think this is what you are after (especially becaue the negative look-behind assertion has been dropped), but I guess, your only option is to slurp up the decimal places like in this example:

s/
  (?:
    (?<=\d)
    (?=(?:\d\d\d)+\b)
   |
    ( \d{0,3} \. \d+ )
  )
 / $1 ? $1 : ',' /exg;

P.S. I think it is a good example when not used as the first one in the book, as it demonstrates some of the pitfalls and limitations of look-around assertions.

Giaour answered 25/2, 2010 at 8:42 Comment(1)
It is in fact the last example in the book for these things. The problem with this answer, however, is that the (?<!\.) doesn't do anything. If you remove it you get the same answer. :)Absorber

© 2022 - 2024 — McMap. All rights reserved.