Regex: I want this AND that AND that... in any order
Asked Answered
F

7

71

I'm not even sure if this is possible or not, but here's what I'd like.

String: "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870"

I have a text box where I type in the search parameters and they are space delimited. Because of this, I want to return a match is string1 is in the string and then string2 is in the string, OR string2 is in the string and then string1 is in the string. I don't care what order the strings are in, but they ALL (will somethings me more than 2) have to be in the string.

So for instance, in the provided string I would want:

"FEB Low"

or

"Low FEB"

...to return as a match.

I'm REALLY new to regex, only read some tutorials on here but that was a while ago and I need to get this done today. Monday I start a new project which is much more important and can't be distracted with this issue. Is there anyway to do this with regular expressions, or do I have to iterate through each part of the search filter and permutate the order? Any and all help is extremely appreciated. Thanks.

UPDATE: The reason I don't want to iterate through a loop and am looking for the best performance wise is because unfortunately, the dataTable I'm using calls this function on every key press, and I don't want it to bog down.

UPDATE: Thank you everyone for your help, it was much appreciated.

CODE UPDATE:

Ultimately, this is what I went with.

string sSearch = nvc["sSearch"].ToString().Replace(" ", ")(?=.*");
if (sSearch != null && sSearch != "")
{
  Regex r = new Regex("^(?=.*" + sSearch + ").*$", RegexOptions.IgnoreCase);
  _AdminList = _AdminList.Where<IPB>(
                                       delegate(IPB ipb)
                                       {
                                          //Concatenated all elements of IPB into a string
                                          bool returnValue = r.IsMatch(strTest); //strTest is the concatenated string
                                          return returnValue;
                                    }).ToList<IPB>();
                                       }
}

The IPB class has X number of elements and in no one table throughout the site I'm working on are the columns in the same order. Therefore, I needed to any order search and I didn't want to have to write a lot of code to do it. There were other good ideas in here, but I know my boss really likes Regex (preaches them) and therefore I thought it'd be best if I went with that for now. If for whatever reason the site's performance slips (intranet site) then I'll try another way. Thanks everyone.

Faun answered 20/8, 2010 at 17:38 Comment(1)
The elegance of the question as asked, and of the answers which answer it directly, is their applicability not just to a simple regex search/replace within C# code, but also to a search/replace in the IDE, whether VS, VSCode, Notepad++, or any other editor which supports regex searching.Nannienanning
B
141

You can use (?=…) positive lookahead; it asserts that a given pattern can be matched. You'd anchor at the beginning of the string, and one by one, in any order, look for a match of each of your patterns.

It'll look something like this:

^(?=.*one)(?=.*two)(?=.*three).*$

This will match a string that contains "one", "two", "three", in any order (as seen on rubular.com).

Depending on the context, you may want to anchor on \A and \Z, and use single-line mode so the dot matches everything.

This is not the most efficient solution to the problem. The best solution would be to parse out the words in your input and putting it into an efficient set representation, etc.

Related questions


More practical example: password validation

Let's say that we want our password to:

  • Contain between 8 and 15 characters
  • Must contain an uppercase letter
  • Must contain a lowercase letter
  • Must contain a digit
  • Must contain one of special symbols

Then we can write a regex like this:

^(?=.{8,15}$)(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[!@#$%^&*]).*$
 \__________/\_________/\_________/\_________/\______________/
    length      upper      lower      digit        symbol
Broca answered 20/8, 2010 at 17:54 Comment(5)
I'm not quite sure what you mean by "efficient set representation." Couldn't I do this: string strTest = "^(?=.*" + strSearch.Replace(" ", ")(?=.*") + ").*$" That would make everything into one string without having to iterate wouldn't it?Faun
Thanks, that worked pretty smoothly, didn't have to wait longer than 2 seconds on 1600+ records.Faun
@Faun What pattern did you use for your test ?Engler
Will add code in just a minute. I have to manually type it out.Faun
It would be worth it to mention that * allows for ?=. to match after any amount of characters. It took me a little searching and testing to figure that out without being explicitly mentioned.Ranie
C
4

Why not just do a simple check for the text since order doesn't matter?

string test = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
test = test.ToUpper();
bool match = ((test.IndexOf("FEB") >= 0) && (test.IndexOf("LOW") >= 0));

Do you need it to use regex?

Chinchy answered 20/8, 2010 at 17:58 Comment(7)
Because I was trying to avoid iteration if possible. I had heard the Regex engine was stronger than any foreach or for loop, so I wanted to use that.Faun
@Faun regex is sloooooow :) It's great for pattern matching complex patterns and making the matching process easier to code... but for speed not so much.Chinchy
@Chinchy - Slower than a for loop? :)Faun
@Faun There is no for loop in my example :P Internally, IndexOf probably has some type of iterating but internally so does Regex, except it has a whole bunch of language specific regex details to deal with as well.Chinchy
Well, what I'm saying is that your example shows 2 statements of IndexOf, when I may have infinite number of statements, so I'd have to iterate somehow all of the posibilities, most likely with a for loop.Faun
@Faun I don't see how Regex would help you if you have a possible infinite number of patterns to match against either. It sounds like there is more to this question that just a simple string check. Also keep in mind, with an if check using and, if any of the checks fail, it does not check the rest.Chinchy
Yeah, that's why I used the REPLACE function to replace all spaces with a specific string then added a string to the front and back so that it only took two statements. I'll post my code in a minute.Faun
C
3

I think the most expedient thing for today will be to string.Split(' ') the search terms and then iterate over the results confirming that sourceString.Contains(searchTerm)

var source = @"NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870".ToLowerInvariant();
var search = "FEB Low";

var terms = search.Split(' ');

bool all_match = !terms.Any(term => !(source.Contains(term.ToLowerInvariant())));

Notice that we use Any() to set up a short-circuit, so if the first term fails to match, we skip checking the second, third, and so forth.


This is not a great use case for RegEx. The string manipulation necessary to take an arbitrary number of search strings and convert that into a pattern almost certainly negates the performance benefit of matching the pattern with the RegEx engine, though this may vary depending on what you're matching against.

You've indicated in some comments that you want to avoid a loop, but RegEx is not a one-pass solution. It is not hard to create horrifically non-performant searches that loop and step character by character, such as the infamous catastrophic backtracking, where a very simple match takes thousands of steps to return false.

Capitate answered 20/8, 2010 at 17:52 Comment(3)
I figured a for loop in my code (for whatever reason, just thinking) would be more performance damage than whatever the regex engine does in the library. Yeah, it all gets chopped up into machine code, but I don't know how the regex is written, just that it was written by people smarter than me (I'm willing to admit it). :)Faun
@Faun Yes, it is easy to make mistakes that forfeit performance, but it is always a question of "right tool for the job." I think the consensus here is that RegEx isn't it for this particular job. In the end, though, you may not see a difference, but it wouldn't be too hard to try it both ways for yourself and choose whatever works best, either (except for that looming deadline).Capitate
Exactly. Like I said in my last update, there were many great ideas and all of them could work probably, that's why I'm glad I can keep this question here and have the options available to me. I did try yours however it didn't seem to work. I could have missed something, but once I saw the Regex option I quit what I was doing. Thanks for your help.Faun
K
2

The answer by @polygenelubricants is both complete and perfect but I had a case where I wanted to match a date and something else e.g. a 10-digit number so the lookahead does not match and I cannot do it with just lookaheads so I used named groups:

(?:.*(?P<1>[0-9]{10}).*(?P<2>2[0-9]{3}-(?:0?[0-9]|1[0-2])-(?:[0-2]?[0-9]|3[0-1])).*)+

and this way the number is always group 1 and the date is always group 2. Of course it has a few flaws but it was very useful for me and I just thought I should share it! ( take a look https://www.debuggex.com/r/YULCcpn8XtysHfmE )

Karolinekaroly answered 6/11, 2016 at 18:40 Comment(0)
E
1
var text = @"NS306Low FEBRUARY 2FEB0078/9/201013B1-9-1Low31 AUGUST 19870";   
var matches = Regex.Matches(text, @"(FEB)|(Low)");
foreach (Match match in matches) {
    Console.WriteLine(match.Value);
}

output:

Low
FEB
FEB
Low

should get you started

Engler answered 20/8, 2010 at 17:55 Comment(2)
Yeah, I had that, but I wanted to avoid a for loop as much as possible, thinking it would hinder performance.Faun
@XstreamINsanity. The loop is just for printing them out. The matches collection has the results of the Regex, which only executes once.Engler
C
0

You don't have to test each permutation, just split your search into multiple parts "FEB" and "Low" and make sure each part matches. That will be far easier than trying to come up with a regex which matches the whole thing in one go (which I'm sure is theoretically possible, but probably not practical in reality).

Cal answered 20/8, 2010 at 17:46 Comment(1)
That may be one way I go about it. However, I'd really like to see if there's a Regex solution to it because it is my understanding, and I hope I'm correct and don't sound foolish, that the regex engine would be able to find it faster than me iterating throught he strings in the array.Faun
F
0

Use string.Split(). It will return an array of subtrings thata re delimited by a specified string/char. The code will look something like this.

int maximumSize = 100;
string myString = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
string[] individualString = myString.Split(' ', maximumSize);

For more information http://msdn.microsoft.com/en-us/library/system.string.split.aspx

Edit: If you really wanted to use Regular Expressions this pattern will work. [^ ]* And you will just use Regex.Matches(); The code will be something like this:

string myString = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
string pattern = "[^ ]*"; Regex rgx = new Regex(pattern);
foreach(Match match in reg.Matches(s))
{
//do stuff with match.value
}

Farant answered 20/8, 2010 at 17:55 Comment(1)
Splitting it is easy, I'm trying to not reiterate through a loop.Faun

© 2022 - 2024 — McMap. All rights reserved.