How to use a regex to match if any pattern appears once out of many times in a given sequence
Asked Answered
T

4

6

Hard to word this correctly, but TL;DR.

I want to match, in a given text sentence (let's say "THE TREE IS GREEN") if any space is doubled (or more).

Example:

"In this text,
THE TREE IS GREEN should not match,
THE  TREE IS GREEN should
and so should THE  TREE   IS GREEN
but  double-spaced  TEXT  SHOULD  NOT BE  FLAGGED outside the pattern."

My initial approach would be

/THE( {2,})TREE( {2,})IS( {2,})GREEN/

but this only matches if all spaces are double in the sequence, therefore I'd like to make any of the groups trigger a full match. Am I going the wrong way, or is there a way to make this work?

Tympanum answered 8/7, 2019 at 19:32 Comment(2)
What regex engine are you using ?Backwards
I'll be using the Node 10 engine for this, but I'm curious to see other variants on other engines.Tympanum
L
4

You can use Negative lookahead if there is an option.

First match the sentence that you want to fail, in your case, it is "THE TREE IS GREEN" then give the most generic case that wants to catch your desired result.

(?!THE TREE IS GREEN)(THE[ ]+TREE[ ]+IS[ ]+GREEN)

https://regex101.com/r/EYDU6g/2

Lookout answered 8/7, 2019 at 20:52 Comment(3)
yeah was about to change it to ( +)Lookout
Negative lookahead seems like the way to go! Clean and elegant, accepted.Tympanum
FYI The solution and the regex101 link doesn't matchHollah
R
2

You can just search for the spaces that you're looking for:

/ {2,}/ will work to match two or more of the space character. (https://regexr.com/4h4d4)

You can capture the results by surrounding it with parenthesis - /( {2,})/

You may want to broaden it a bit.
/\s{2,}/ will match any doubling of whitespace. (\s - means any whitespace - space, tab, newline, etc.)

No need to match the whole string, just the piece that's of interest.

Ransdell answered 8/7, 2019 at 19:34 Comment(1)
Thank you for the example, but since I want to quickly scan text for this pattern conditionally occurring in a sequence of words, like "the tree is green". others I don't mind being double-spaced, so I am looking for a one-off solution only using regex.Tympanum
S
0

If I am not mistaken you want the whole match if there is a part present where there are 2 or more spaces between 2 uppercased parts.

If that is the case, you might use:

^.*[A-Z]+ {2,}[A-Z]+.*$
  • ^ Start of string
  • .*[A-Z]+ match any char except a newline 0+ time, then match 1+ times [A-Z]
  • [ ]{2,} Match 2 or more times a space (used square brackets for clarity)
  • A-Z+ Match 1+ times an uppercase char
  • .*$ Match any char except a newline 0+ times until the end of the string

Regex demo

Southerly answered 8/7, 2019 at 19:53 Comment(3)
This works in my example because I used uppercase for clearness, but does not work for a specific word sequence. Edited my example, your regex incorrectly flags double-spaced uppercase words outside the base pattern.Tympanum
So you mean that the pattern is always THE TREE IS GREEN and the whole sentence should match if there is at least a single match for a double space between the words of the pattern?Southerly
I mean for a given sentence like 'THE TREE IS GREEN' I'd like a pattern that will match itself (whole sentence can work but not necessary) if it contains any double space between those words. As @3limin4t0r, something like /THE {2,}TREE +IS +GREEN|THE +TREE {2,}IS +GREEN|THE +TREE +IS {2,}GREEN/gm works but is already quite inelegant and doesn't scale well to larger sentencesTympanum
S
0

You could do this:

import re

pattern = r"THE +TREE +IS +GREEN"

test_str = ("In this text,\n"
    "THE TREE IS GREEN should not match,\n"
    "THE  TREE IS GREEN should\n"
    "and so should THE TREE   IS GREEN\n"
    "but  double-spaced  TEXT  SHOULD  NOT BE  FLAGGED outside the pattern.")

matches = re.finditer(pattern, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    if match.group() != 'THE TREE IS GREEN':
        print ("{match}".format(match = match.group()))
Sweyn answered 8/7, 2019 at 21:10 Comment(1)
After I posted this, noticed that this along the same lines as @karthick's answer aboveSweyn

© 2022 - 2024 — McMap. All rights reserved.