How do I write a regular expression that excludes rather than matches, e.g., not (this|string)?
Asked Answered
S

8

37

I am stumped trying to create an Emacs regular-expression that excludes groups. [^] excludes individual characters in a set, but I want to exclude specific sequences of characters: something like [^(not|this)], so that strings containing "not" or "this" are not matched.

In principle, I could write ([^n][^o][^t]|[^...]), but is there another way that's cleaner?

Saddlebag answered 7/2, 2010 at 19:16 Comment(2)
Click the "regex-negation" tag to see some similar questions.Ebracteate
There is a patch (not accepted) for lookahead assertions which makes this possible: debbugs.gnu.org/db/53/5393.htmlDouglasdouglashome
P
22

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

Pernas answered 7/2, 2010 at 19:52 Comment(2)
+1, and if I was ever tempted to switch to Emacs, this would be reason enough not to. How can anyone live without lookaheads? :PAnnulet
Been enjoying Emacs very much so far, this is my first "what the ..."Rosenwald
G
34

This is not easily possible. Regular expressions are designed to match things, and this is all they can do.

First off: [^] does not designate an "excludes group", it designates a negated character class. Character classes do not support grouping in any form or shape. They support single characters (and, for convenience, character ranges). Your try [^(not|this)] is 100% equivalent to [^)(|hinots], as far as the regex engine is concerned.

Three ways can lead out of this situation:

  1. match (not|this) and exclude any matches with the help of the environment you are in (negate match results)
  2. use negative look-ahead, if supported by your regex engine and feasible in the situation
  3. rewrite the expression so it can match: see a similar question I asked earlier
Gorrian answered 7/2, 2010 at 19:28 Comment(5)
I wonder why is this answer so lowly upvoted, this is the clearest answer here!Gefen
@Yagamy Because it more or less says "doesn't work" while clearly there is a way to make it work (even though an impractical one that's more of a last resort).Gorrian
I doesn't see here a statement "Doesn't work", even contrary: you showed three ways that could solve the problem, and the third one is just like the accepted answer.Gefen
@Yagamy True, but pulling a "magic trick" is way more impressive than a cautionary answer. That's not to diminish the accepted answer, doing it that way is the only option sometimes, but it's damn unwieldy most of the time. I mentioned this option last for a reason. I suppose people like answers with a wow-effect better. :)Gorrian
this is a really great answer in that it helps understand the problem in a way that's more easily solved. in emacs, try M+X keep-lines to drop the lines that don't match what you want.Genuflect
P
22

First of all: [^n][^o][^t] is not a solution. This would also exclude words like nil ([^n] does not match), bob ([^o] does not match) or cat ([^t] does not match).

But it is possible to build a regular expression with basic syntax that does match strings that neither contain not nor this:

^([^nt]|n($|[^o]|o($|[^t]))|t($|[^h]|h($|[^i]|i($|[^s]))))*$

The pattern of this regular expression is to allow any character that is not the first character of the words or only prefixes of the words but not the whole words.

Pernas answered 7/2, 2010 at 19:52 Comment(2)
+1, and if I was ever tempted to switch to Emacs, this would be reason enough not to. How can anyone live without lookaheads? :PAnnulet
Been enjoying Emacs very much so far, this is my first "what the ..."Rosenwald
A
13

Hard to believe that the accepted answer (from Gumbo) was actually accepted! Unless it was accepted because it indicated that you cannot do what you want. Unless you have a function that generates such regexps (as Gumbo shows), composing them would be a real pain.

What is the real use case -- what are you really trying to do?

As Tomalak indicated, (a) this is not what regexps do; (b) see the other post he linked to, for a good explanation, including what to do about your problem.

The answer is to use a regexp to match what you do not want, and then subtract that from the initial domain. IOW, do not try to make the regexp do the excluding (it cannot); do the excluding after using a regexp to match what you want to exclude.

This is how every tool that uses regexps works (e.g., grep): they offer a separate option (e.g. via syntax) that carries out the subtraction -- after matching what needs to be subtracted.

Aquaplane answered 21/8, 2011 at 21:56 Comment(0)
M
10

It sounds like you are trying to do negative lookahead. i.e. you are trying to stop matching once you reach some delimiter.

Emacs doesn't support lookahead directly, but it does support the non-greedy version of the *, +, and ? operators (*?, +?, ??), which can be used for the same purpose in most cases.

So for instance, to match the body of this javascript function:

bar = function (args) {
    if (blah) {
        foo();
    }
};

You can use this emacs regex:

function ([^)]+) {[[:ascii:]]+?};

Here we're stopping once we find the two element sequence "};". [[:ascii:]] is used instad of the "." operator because it works over multiple lines.

This is a little different than negative lookahead because the }; sequence itself it matched, however if your goal is to extract everything up until that point, you just use a capturing group \( and \).

See the emacs regex manual: http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexps.html

As a side note, if you writing any kind of emacs regex, be sure to invoke M-x re-builder, which will bring up a little IDE for writing your regex against the current buffer.

Maziar answered 30/3, 2013 at 3:9 Comment(0)
M
7

Try M-x flush-lines.

Malawi answered 7/2, 2010 at 23:47 Comment(0)
R
2

For use case of matching a string for logical test, I do this:

;; Code to match string ends with '-region' but excludes those that has 'mouse'.
M-x ielm RET
*** Welcome to IELM ***  Type (describe-mode) for help.
ELISP> (setq str1 "mouse-drag-region" str2 "mou-drag-region" str3 "mou-region-drag")
"mou-region-drag"
ELISP> (and (string-match-p "-region$" str1) (not (string-match-p "mouse" str1)))
nil
ELISP> (and (string-match-p "-region$" str2) (not (string-match-p "mouse" str2))) 
t
ELISP> (and (string-match-p "-region$" str3) (not (string-match-p "mouse" str3)))
nil

I use this approach to avoid the bug of the function I discussed Over Here:

Rosenwald answered 3/8, 2015 at 21:6 Comment(0)
T
1

My problem was how to pass a negated regexp to delete-lines the solution was to pass the regexp M-x keep-lines

Tadd answered 6/4, 2021 at 14:48 Comment(0)
E
0

If you are trying to use regex to find or replace text in a buffer you can use https://github.com/benma/visual-regexp-steroids.el/

Visual regexp steroids allows you to replace, search, etc. using python regex. Python regex has support for negative look ahead and negative look behind.

Eyebrow answered 11/5, 2020 at 16:52 Comment(1)
Welcome to stackoverflow. Please include all the key details in your answer. As written your answer will have little value if the external link changes. See How to Answer for more details.Pemberton

© 2022 - 2024 — McMap. All rights reserved.