Regular expression that doesn't contain certain string [duplicate]
Asked Answered
D

7

92

I have something like this

aabbabcaabda

for selecting minimal group wrapped by a I have this /a([^a]*)a/ which works just fine

But i have problem with groups wrapped by aa, where I'd need something like /aa([^aa]*)aa/ which doesn't work, and I can't use the first one like /aa([^a]*)aa/, because it would end on first occurence of a, which I don't want.

Generally, is there any way, how to say not contains string in the same way that I can say not contains character with [^a]?

Simply said, I need aa followed by any character except sequence aa and then ends with aa

Debark answered 4/4, 2009 at 19:22 Comment(0)
A
20

In general it's a pain to write a regular expression not containing a particular string. We had to do this for models of computation - you take an NFA, which is easy enough to define, and then reduce it to a regular expression. The expression for things not containing "cat" was about 80 characters long.

Edit: I just finished and yes, it's:

aa([^a] | a[^a])aa

Here is a very brief tutorial. I found some great ones before, but I can't see them anymore.

Azotemia answered 4/4, 2009 at 19:30 Comment(3)
do you know of any tutorial which explains this?Debark
There's a good regex tutorial here: regular-expressions.infoMicrogram
Hello, are you sure about that ? Can somebody tell us if there is something wrong with my answer : https://mcmap.net/q/224669/-regular-expression-that-doesn-39-t-contain-certain-string-duplicateShaunna
C
236

I found a blogpost from 2007 which gives the following regex that matches string which don't contains a certain substring:

^((?!my string).)*$

It works as follows: it looks for zero or more (*) characters (.) which do not begin (?! - negative lookahead) your string and it stipulates that the entire string must be made up of such characters (by using the ^ and $ anchors). Or to put it an other way:

The entire string must be made up of characters which do not begin a given string, which means that the string doesn't contain the given substring.

Corniculate answered 5/3, 2010 at 13:39 Comment(7)
According to the docs, this is negative lookahead, not lookbehindAlton
(from the cited blog) full regexp ref: regular-expressions.info/refadv.htmlEquine
The exact solution for the question is: ^aa(?!.*aa.*aa).*aa$ i.e. start by aa, look ahead and discard selections that follow with [anything]aa[anything]aa, and finish by aaEquine
In place of the period, you can match past a single line with something like this: ^((?!my string)(\s|\S))*$Bhayani
@Bhayani - yes, that's correct - using (\s|\S) instead of . is a workaround for when you can't specify regex flags such as s (dot matches anything - including newline).Corniculate
I suppose it depends on the engine? According to MDN, dot doesn't recognize line terminators in Javascript. SublimeText's "find/replace" RegEx doesn't match newlines with dot either.Bhayani
I used to think of myself as Regex Jedi until I came upon on this problem.Shrader
A
20

In general it's a pain to write a regular expression not containing a particular string. We had to do this for models of computation - you take an NFA, which is easy enough to define, and then reduce it to a regular expression. The expression for things not containing "cat" was about 80 characters long.

Edit: I just finished and yes, it's:

aa([^a] | a[^a])aa

Here is a very brief tutorial. I found some great ones before, but I can't see them anymore.

Azotemia answered 4/4, 2009 at 19:30 Comment(3)
do you know of any tutorial which explains this?Debark
There's a good regex tutorial here: regular-expressions.infoMicrogram
Hello, are you sure about that ? Can somebody tell us if there is something wrong with my answer : https://mcmap.net/q/224669/-regular-expression-that-doesn-39-t-contain-certain-string-duplicateShaunna
M
12

All you need is a reluctant quantifier:

regex: /aa.*?aa/

aabbabcaabda   => aabbabcaa

aaaaaabda      => aaaa

aabbabcaabda   => aabbabcaa

aababaaaabdaa  => aababaa, aabdaa

You could use negative lookahead, too, but in this case it's just a more verbose way accomplish the same thing. Also, it's a little trickier than gpojd made it out to be. The lookahead has to be applied at each position before the dot is allowed to consume the next character.

/aa(?:(?!aa).)*aa/

As for the approach suggested by Claudiu and finnw, it'll work okay when the sentinel string is only two characters long, but (as Claudiu acknowledged) it's too unwieldy for longer strings.

Microgram answered 5/4, 2009 at 7:32 Comment(2)
I think our way is the only method that'll work with a non-backtracking implementation ( swtch.com/~rsc/regexp/regexp1.html ), but yeah, it is terribly annoying. I just don't know regex well enough to know about these lookahead things =).Azotemia
Most modern regex flavors, especially those built into programming languages, are of the backtracking, NFA type. Even JavaScript, one of the least featureful flavors, supports lookaheads and reluctant quantifiers. regular-expressions.info/refflavors.htmlMicrogram
E
7
/aa([^a]|a[^a])*aa/
Esdraelon answered 4/4, 2009 at 19:24 Comment(1)
15 years later... Can you add an explanation to this please?Hydrophane
S
6

I'm not sure it's a standard construct, but I think you should have a look on "negative lookahead" (which writes : "?!", without the quotes). It's far easier than all answers in this thread, including the accepted one.

Example : Regex : "^(?!123)[0-9]*\w" Captures any string beginning by digits followed by letters, UNLESS if "these digits" are 123.

https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference (Microsoft page, but quite comprehensive) for lookahead / lookbehind

PS : it works well for me (.Net). But if I'm wrong on something, please let us know. I find this construct very simple and effective, so I'm surprised of the accepted answer.

Shaunna answered 21/11, 2014 at 11:27 Comment(0)
K
5

I had the following code and I had to replace and add a GET-parameter to all references to JS-files EXCEPT one.

<link rel="stylesheet" type="text/css" href="/login/css/ABC.css" />
<script type="text/javascript" language="javascript" src="/localization/DEF.js"></script>
<script type="text/javascript" language="javascript" src="/login/jslib/GHI.js"></script>
<script type="text/javascript" language="javascript" src="/login/jslib/md5.js"></script>
sendRequest('/application/srvc/EXCEPTION.js', handleChallengeResponse, null);
sendRequest('/application/srvc/EXCEPTION.js",handleChallengeResponse, null);

This is the regex I used:

(?<!EXCEPTION)(\.js)

What that does is look for all occurences of ".js" and if they are preceeded by the "EXCEPTION" string, discard that result from the result array. That's called negative lookbehind. Since I spent a day on finding out how to do this I thought I should share.

Kistler answered 22/11, 2012 at 12:18 Comment(0)
B
3
".*[^(\\.inc)]\\.ftl$"

In Java this will find all files ending in ".ftl" but not ending in ".inc.ftl", which is exactly what I wanted.

Bluegrass answered 1/12, 2011 at 19:17 Comment(1)
[] split inc into i, n, c. So it is false with both "a.i.ftl".matches(".*[^(\\.inc)]\\.ftl$") and "a.inc.ftl".matches(".*[^(\\.inc)]\\.ftl$").Candiecandied

© 2022 - 2024 — McMap. All rights reserved.