Regex negation?
Asked Answered
I

3

18

I'm playing Regex Golf (http://regex.alf.nu/) and I'm doing the Abba hole. I have the following regex that matches the wrong side entirely (which is what I was trying to do):

(([\w])([\w])\3\2)

However, I'm trying to negate it now so it matches the other side. I can't seem to figure that part out. I tried:

(?!([\w])([\w])\3\2)

But that didn't work. Any tips from the regex masters?

Indigestive answered 25/12, 2013 at 15:47 Comment(1)
The wrong side? What side? Please provide a complete context.Jacynth
I
25

You can make it much shorter (and get more points) by simply using . and removing unnecessary parens:

^(?!.*(.)(.)\2\1)

It just makes sure that there's no "abba" ("abba" here means 4 letters in that particular order we don't want to match) in any part of the string without having to match the whole word.

Impassion answered 25/12, 2013 at 16:35 Comment(4)
I know this is old, but could you explain how ?! works please? More specifically, why (?!(.)(.)\2\1) matches everything.Aalborg
@AdiBradfield (?!a)a will never match anything because after (?! ... ) group there is a and (?!a) prevents a match if after it, there is a (what's inside). Similarly, (?!a)b will always match a b, because while the (?!a) prevents a match if it is followed by a, it will never happen because there is a b after it. By extension,, ^(?!.*a) will prevent a match if any line contains a. The anchor and the .* are important because otherwise, the pattern will start to match after any a that might exist (because after that point, there are no more a to prevent the match.Impassion
Okay, yeah that makes perfect sense. Thanks for the clarificationAalborg
@Zikato It is working; the regex simply matches an empty line, not that it matters if you are only checking for a match or not, see i.sstatic.net/5MzvR.pngImpassion
I
2

Using the explanation here: https://mcmap.net/q/25789/-regular-expression-to-match-a-line-that-doesn-39-t-contain-a-word

I came up with: ^((?!((\w)(\w)\4\3)).)*$

Inferential answered 25/12, 2013 at 16:6 Comment(0)
T
2

The key here turns out to be the leading caret, ^, and the .*

(?! ...) is a look-ahead construct, and so does not advance the regex processing engine.

/(?! ...)/ on its own will correctly return a negative result for items matching the expression within; but for items which do not match (...) the regex engine continues processing. However if your regex only contains the (?! ) there is nothing left to process, and the regex processing position never advances. (See this great answer).

Apparently since the remaining regex is empty, it matches any zero-width segment of a string, i.e. it matches any string.

[begin SWAG]

With the caret ^ present, the regex engine is able to recognize that you are looking for a real answer and that you do not want it to tell you the string contains zero-width components.

[end SWAG]

Thus it is able to correctly fail to match when the (?! ) succeeds.

Thirst answered 8/1, 2014 at 10:48 Comment(1)
To make your SWAG more precise, the caret does not allow the engine to find zero length matches that are not at the beginning of the string.Moderator

© 2022 - 2024 — McMap. All rights reserved.