A Regex that will never be matched by anything

P

30

167

What's your thought - what does a Regex look like, that will never be matched by any string, ever!

Edit: Why I want this? Well, firstly because I find it interesting to think of such an expression and secondly because I need it for a script.

In that script I define a dictionary as Dictionary<string, Regex>. This contains, as you see, a string and an expression.

Based on that dictionary I create methods that all use this dictionary as only reference on how they should do their work, one of them matches the regexes against a parsed logfile.

If an expression is matched, another Dictionary<string, long> is added a value that is returned by the expression. So, to catch any log-messages that are not matched by an expression in the dictionary I created a new group called "unknown".

To this group everything that didn't match anything other is added. But to prevent the "unknown"-expression to mismatch (by accident) a log-message, I had to create an expression that is most certainly never matched, no matter what string I give it.

Pontonier answered 12/11, 2009 at 15:46 Comment(19)

Note that it is very hard to prove a negative. – Jacqulynjactation 12/11, 2009 at 15:52

I don't think it should have been closed, but I don't want to vote to reopen without some background info. – Premillenarian 13/11, 2009 at 14:26

Well, as you see: "Not a real question"... don't know more either as it was a real question that I actually needed for a project... – Pontonier 13/11, 2009 at 14:49

@ApoY2k if you specify the use case, it might get reopened. – Paint 16/11, 2009 at 13:3

I'll try to, still it's quite specific... – Pontonier 16/11, 2009 at 13:6

regex actually isn't the same in phyton / javascript / php / whatever, so in what language are you trying to use it? – Carcajou 4/12, 2009 at 5:44

also, post the language in the title and tags, so that you can be helped out easier. – Carcajou 4/12, 2009 at 5:45

Interesting. Where would you use such a regex? – Untutored 4/12, 2009 at 5:46

yoda, as mentioned above, I'm using Python and Javascript, but don't want to limit the responses only to those languages. I'd be happy to examine solutions in other syntaxes, and believe I could port the solution easily enough. Yes, different languages aren't identical, but they're pretty darn close in this area, 9 times out of 10. – Musing 4/12, 2009 at 6:2

Charlie, the most recent use case was where I'm building a regex programmatically, with groups like (foo|bar|baz) built from external inputs. I want all groups to be present, but some may have no external input and should thus never match. If I do nothing, I'll have empty groups of () which, at least in Python, match between every character. I want all groups to be present so a regex.sub() call with a callback routine performing the replacement can be simplified, both for readability and for performance in a loop. – Musing 4/12, 2009 at 6:9

this has been asked #1723682 – Paint 4/12, 2009 at 6:52

Why the complexity tag? I cannot see how it applies here. – Padauk 12/2, 2010 at 16:42

I'll note here for the record that many of the comments above, and answers to this question, were originally from #1845578 which is one I asked. Marc Gravell merged them, which I think makes many of these responses kind of bizarre without the precise original context, to the point some comments don't appear to make sense. (Probably also steals away potential future rep points, too.) I would suggest that questions with such detailed backgrounds could never be "exact duplicates". Whatever... – Musing 31/1, 2011 at 22:19

here's another reason to use this, specific to perl: it's something to put on one side of the conditional regex construct if ones of the results you want is "don't match at all". e.g. s/(?(?{ defined $ENV{FOO} })foo|(*F))/bar/g, "substitute bar for foo if $FOO, otherwise do nothing" – Corot 5/3, 2013 at 15:25

This question has been added to the Stack Overflow Regular Expressions FAQ, under "Advanced Regex-Fu". – Remake 15/4, 2014 at 15:4

@CharlieSalts: I need this because I have a class that captures all input lines starting with the line matching regex 1, ending with the one matching regex 2. If regex 2 is impossible-to-match, this allows them to get all lines after the start-line. Sort of like the {min,max} and {min,} regex quantifiers. – Remake 15/4, 2014 at 15:23

"Note that it is very hard to prove a negative" -- this is widely believed yet utterly and obviously false ... as we've known at least since Euclid proved that there's no greatest prime. And any proof of P is a proof of the negation of (not P). What is true is that it's difficult to prove an empirical universal, positive or negative, e.g., "all ravens are black" or "no raven is white". Algorithms are analytical, not empirical, so this is a particularly bad misapplication of the bogus rule. e.g., a proof that the pattern 'a' doesn't match any string that starts with 'b' is not "very hard". – Mitsukomitt 21/7, 2019 at 3:22

" ^.{0}$ " I believe that this would probably be the only expression actually ever capable of never matching anything. Due to the simple fact that if it does actually have the possibility of matching anything, than it also means that there is nothing being matched. – Toaster 24/5, 2021 at 21:9

Even better and simpler than that one. ` ^{0} ` No invalid syntax errors and even if you added any character to the end it wouldn't matter. ` ^{0}.* ` with or without anchoring $ at the end. – Toaster 24/5, 2021 at 22:4

B

79

This is actually quite simple, ~~although it depends on the implementation / flags~~*:

$a

Will match a character a after the end of the string. Good luck.

WARNING:
This expression is expensive -- it will scan the entire line, find the end-of-line anchor, and only then not find the a and return a negative match. (See comment below for more detail.)

^* Originally I did not give much thought on multiline-mode regexp, where $ also matches the end of a line. In fact, it would match the empty string right before the newline, so an ordinary character like a can never appear after $.

Backfill answered 12/11, 2009 at 15:46 Comment(11)

This expression is expensive -- it will scan the entire line, find the end-of-line anchor, and only then not find the "a" and return a negative match. I see it take ~480ms to scan a ~275k line file. The converse "a^" takes about the same time, even if it might seem more efficient. On the other hand, a negative lookahead need not scan anything: "(?!x)x" (anything not followed by an x also followed by an x, i.e. nothing) takes about 30ms, or less than 7% of the time. (Measured with gnu time and egrep.) – Berbera 23/3, 2011 at 13:38

In Perl that will match the current value of $a. It's Perl equivalent $(?:a) is also very slow perl -Mre=debug -e'$_=a x 50; /$(?:a)/'. – Utilitarian 6/4, 2013 at 14:56

@Berbera , please see my answer regarding timing, as I found the exact opposite measured with timeit and python3. – Lossa 4/11, 2017 at 20:59

It's not shocking that six years and a major version of Python might change things. – Berbera 8/11, 2017 at 21:23

Here's a JavaScript comparison of some methods covered here: jsperf.com/regex-that-never-matches – Natalyanataniel 7/6, 2018 at 0:33

In POSIX BRE syntax, $a will match the literal text $a, because $ is invalid as an anchor in that pattern. – Thwart 26/8, 2018 at 11:18

What about a^? It shouldn't match anything either and it is in the beginning of the string. – Laburnum 19/9, 2018 at 17:2

@VladimirKondenko IIRC It will still scan the string looking for as, but ^o^ would work, I guess. – Lanciform 19/1, 2019 at 11:57

I tried using "a^", but this does not work: That string matches itself! Apparently a ^ that's not at the beginning of a regular expression matches a normal ^ character. Instead, I found this to work: "^(?!x)x" – Shanel 18/5, 2021 at 13:7

An anchor isn't an anchor if it isn't being used as an anchor. Inside of a regular expression the value: ^.*$ specify that the characters [$^] are being used for their "special purposes" of matching the empty values of their designated side. Notice how, $.*^ won't completely disregard and overpower syntax rules, anchoring themselves flipped inside out dumping core. If the ^ isn't the first character of the regular expression, nor the first character of a range meaning not, such as [^a-z], than anywhere else has no special meaning. Likewise with a $, but as the last character of the regex. – Toaster 24/5, 2021 at 20:49

echo '$a$a$a$a$a$a' | grep '$a' Am I doing something wrong maybe? – Toaster 24/5, 2021 at 21:49

L

87

Leverage negative lookahead:

>>> import re
>>> x=r'(?!x)x'
>>> r=re.compile(x)
>>> r.match('')
>>> r.match('x')
>>> r.match('y')

this RE is a contradiction in terms and therefore will never match anything.

NOTE:
In Python, re.match() implicitly adds a beginning-of-string anchor (\A) to the start of the regular expression. This anchor is important for performance: without it, the entire string will be scanned. Those not using Python will want to add the anchor explicitly:

\A(?!x)x

Lost answered 4/12, 2009 at 5:46 Comment(7)

@Chris, yep -- also, (?=x)(?!x) and so on (concatenations of contradictory lookaheads, and same for lookbehinds), and many of those also work for arbitrary values of x (lookbehinds need xs that match strings of fixed-length). – Lost 4/12, 2009 at 6:18

Appears to work well. But what about just (?!) instead? Since () will always match, wouldn't (?!) be guaranteed never to match? – Musing 4/12, 2009 at 6:31

@Peter, yes, if Python accepts that syntax (and recent releases appear to), then it would be self-contradictory as well. Another idea (not quite as elegant, but the more ideas you get the likelier you are to find one working across all RE engines of interest): r'a\bc', looking for a word-boundary immediately surrounded by letters on both sides (variant: nonword characters on both sides). – Lost 4/12, 2009 at 15:17

(?!) does seem to work with Python, but not with Javascript (in Firefox 3.5). Modifying it to (?!()) works for both. I'm not sure I'd want to rely on it in JS though. Performance could also be a consideration... some of these may be relatively slow, if they test at every character, while others may short-circuit the test. – Musing 4/12, 2009 at 20:24

Interestingly, my original with a simple literal that I "know" won't appear in my input turns out to be fastest, in Python. With a 5MB input string, and using this in a sub() operation, (?!x)x takes 21% longer, (?!()) is 16%, and ($^) 6% longer. May be significant in some cases, though not in mine. – Musing 4/12, 2009 at 21:23

That can be quite slow perl -Mre=debug -e'$_=x x 8; /(?!x)x/'. You can make it faster by anchoring it at the beginning \A(?!x)x or at the end (?!x)x\z. perl -Mre=debug -e'$_=x x 8; /(?!x)x\z/; /\A(?!x)x/' – Utilitarian 6/4, 2013 at 15:3

@Brad Gilbert, It turns out that Python's re.match() implicitly adds the \A to the beginning of the regexp. The rest of us, as you note, need to add the \A explicitly. – Maternal 12/11, 2013 at 16:15

B

79

This is actually quite simple, ~~although it depends on the implementation / flags~~*:

$a