Multiple words in any order using regex [duplicate]
Asked Answered
R

8

87

As the title says , I need to find two specific words in a sentence. But they can be in any order and any casing. How do I go about doing this using regex?

For example, I need to extract the words test and long from the following sentence whether the word test comes first or long comes.

This is a very long sentence used as a test

UPDATE: What I did not mention in the first part is that it needs to be case insensitive as well.

Ruysdael answered 24/7, 2009 at 11:30 Comment(5)
Do you care about multiple occurrences of the words? Do you know what words you want to extract, or are you wanting to match words that fit a particular pattern? Do you want to find out what position they're at?Zellazelle
I know the exact words , dont car about multiples , dont need the position. I do need to be case insensitiveRuysdael
Which regex flavor are you using? JavaScript, .NET, PHP...? And how important is performance? Are you working with very large strings, or doing a great many matches? Several viable answers have been posted already, but none of them is particularly efficient.Hypertonic
I think the most important thing (which i found out today) is that it is doing the checks in .Net , so i am not sure if all the answers below apply , i have tried all and sadly .net does not pick up any as case insensitiveRuysdael
Ehh, whether it's case sensitive or not should not be dependent on regex. You're better off with programming the software to be case insensitive. However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b.*){2,}. Unlike lookaround or mode modifier, this works in most regex flavours.Susumu
V
44

Use a capturing group if you want to extract the matches: (test)|(long) Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

Vasculum answered 24/7, 2009 at 11:49 Comment(2)
I used this answer in conjunction with the (?i) from the answer below, This resulted in the following out put (?i)(test(long)?) because it turns out i had to test for test first and then long. If it is the correct way is another story but it worked for meRuysdael
Given the requirement is to match test AND long, this solution needs the g flag.Groundnut
B
64

You can use

(?=.*test)(?=.*long)

Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record

Biggs answered 11/7, 2014 at 12:49 Comment(8)
This fails. regex101.com/r/Z8KOLp/2Groundnut
@Groundnut the mentioned regexp does not return the words test or long. But it returns matches if it found the words test or long in any order and returns non matches if it doesn't.Biggs
Sure, but the requirement is to find/extract the words, not simply to test for them.Groundnut
Another variant: (?is)^(?=.*\b(test)\b)(?=.*?\b(long)\b).* which also captures the words and matches all string. Further anchored to ^ start which improves performance considerably. \b matches a word boundary.Acrostic
is there any way to match any or both of these two words?. e.g.if used that regexp with the text "This is a very long sentence" , the long word will not be found. It would be good to add an optional modifier. Is it possible?Fredia
Ok for my previous comment i guess just \b(test)|\b(long) will be enoughFredia
If the answers regex is too slow, you can add ^ to the start of it to considerably increase performance as said in bobblebubble's answer.Disc
MariaDB word boundaries are [[:<:]] and [[:>:]]. So instead of \b(test)\b you do; [[:<:]](test)[[:>:]]Disc
V
44

Use a capturing group if you want to extract the matches: (test)|(long) Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

Vasculum answered 24/7, 2009 at 11:49 Comment(2)
I used this answer in conjunction with the (?i) from the answer below, This resulted in the following out put (?i)(test(long)?) because it turns out i had to test for test first and then long. If it is the correct way is another story but it worked for meRuysdael
Given the requirement is to match test AND long, this solution needs the g flag.Groundnut
P
15

I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern.

/(?i)(\btest\b.*\blong\b|\blong\b.*\btest\b)/
Pneumatometer answered 24/7, 2009 at 13:55 Comment(0)
A
9

without knowing what language

 /test.*long/ 

or

/long.*test/

or

/test/ && /long/
Alper answered 24/7, 2009 at 11:35 Comment(1)
I'd add word boundaries, e.g. /\btest\b/.Cameliacamella
F
4

Try this:

/(?i)(?:test.*long|long.*test)/

That will match either test and then long, or long and then test. It will ignore case differences.

Flam answered 24/7, 2009 at 11:58 Comment(1)
You don't need (?i) unless you want it to be case sensitive, but this works.Allisonallissa
M
3

Vim has a branch operator \& that allows an even terser regex when searching for a line containing any number of words, in any order.

For example,

/.*test\&.*long

will match a line containing test and long, in any order.

See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.

Messere answered 4/11, 2019 at 13:35 Comment(0)
A
2

I was using libpcre with C, where I could define callouts. They helped me to easily match not just words, but any subexpressions in any order. The regexp looks like:

(?C0)(expr1(?C1)|expr2(?C2)|...|exprn(?Cn)){n}

and the callout function guards that every subexpression is matched exactly once,like:

int mycallout(pcre_callout_block *b){
static int subexpr[255];
if(b->callout_number == 0){
    //callout (?C0) - clear all counts to 0
    memset(&subexpr,'\0',sizeof(subexpr));
    return 0;
}else{
    //if returns >0, match fails
    return subexpr[b->callout_number-1]++;
}
}

Something like that should be possible in perl as well.

Astrahan answered 7/3, 2011 at 0:5 Comment(0)
F
-9

I don't think that you can do it with a single regex. You'll need to d a logical AND of two - one searching for each word.

Fortyfour answered 24/7, 2009 at 11:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.