Matching strings with wildcard
Asked Answered
S

11

105

I would like to match strings with a wildcard (*), where the wildcard means "any". For example:

*X = string must end with X
X* = string must start with X
*X* = string must contain X

Also, some compound uses such as:

*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.

Is there a simple algorithm to do this? I'm unsure about using regex (though it is a possibility).

To clarify, the users will type in the above to a filter box (as simple a filter as possible), I don't want them to have to write regular expressions themselves. So something I can easily transform from the above notation would be good.

Safeguard answered 18/5, 2015 at 9:36 Comment(2)
Should YZ ABC X match *X*YZ*, i.e. do the substrings need to appear in the same order in both the string and the pattern or not? I'd assume it shouldn't match, but "string contains X and contains YZ" doesn't make it clear. If it should match, all the current answers are wrong.Benedetto
That would be a no. In the example given, X must appear before YZ.Safeguard
J
38

You could use the VB.NET Like-Operator:

string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);  

Use CompareMethod.Text if you want to ignore the case.

You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.

Since it's part of the .NET framework and will always be, it's not a problem to use this class.

Jackiejackinoffice answered 18/5, 2015 at 9:43 Comment(6)
hmm, adding "using" results in:Type or namespace name 'CompilerServices' does not exist in namespace 'Microsoft.VisualBasic' (are you missing an assembly reference?Theorbo
You need to add a reference to the Microsoft.VisualBasic.dll: https://mcmap.net/q/205641/-how-to-add-reference-to-microsoft-visualbasic-dllJackiejackinoffice
It appears that this is no longer available in .Net 4.6. :(Successive
@AndrewRondeau Are you sure? May have to update the correct answer in that case, i.e. I'm guessing right now it's a bug waiting to happen for me.Safeguard
I'm using 4.7 and it works fine. There is a note on the website saying it's not supported in .NET Core and .NET Standard projects though.Gagger
It is now supported in .NET Core, version 3.0 onwards: learn.microsoft.com/en-us/dotnet/api/…Unlace
F
193

Often, wild cards operate with two type of jokers:

  ? - any character  (one and only one)
  * - any characters (zero or more)

so you can easily convert these rules into appropriate regular expression:

// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
  return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$"; 
}

// If you want to implement "*" only
private static String WildCardToRegular(String value) {
  return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$"; 
}

And then you can use Regex as usual:

  String test = "Some Data X";

  Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
  Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
  Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));

  // Starts with S, ends with X, contains "me" and "a" (in that order) 
  Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));
Formicary answered 18/5, 2015 at 10:15 Comment(9)
It's not as easy as you claim. For example, one specialty is that when using Directory.GetFiles, a three letter extension .htm would also match .html, but a two letter extension .ai would not match aix or aifg. Windows wildcards are trivial on first sight, but under the hood, they're a bunch of grown legacy hypercomplex rulesets.Tetrameter
@Sebastian Mach: Thank you for mentioning the nuance! I agree that MS DOS (and Windows) interpretation of the wild cards is different from standard one en.wikipedia.org/wiki/Wildcard_character However, the question is about strings and it doesn't mention files; that's why I've put the simplest solution assuming * being any characters (zero or more) and ? being exactly one character .Formicary
The original question was for string identifiers, not the filesystem, correct.Safeguard
If you worry about performance, here's a C# implementation of a wildcard matching algorithm which is a lot faster than RegEx for this specific problem.Breadthways
@Sebastian Mach... those are the 8.3 filenames that it matches.Gujral
If you're going to use RegEx, you'll need to escape out special RegEx token characters first, such as periods, backlashes, etc. like so: "^" + Regex.Escape(pattern).Replace("\\*", ".*").Replace("\\?", ".") + "$";Eerie
@dynamichael: Yes, you are quite right, the escapement is mandatory in case of arbitrary value. However, Replace is a bad practice: there are more symbols that you mentioned, and the list of them is open (what if regex introduce more of them in future versions?). That's why I use Regex.Escape(value) and then Replace wildcards only ? and *.Formicary
the wildcard matching algorithm (recommended by @Breadthways above) is a piece of bad code - be prepared to fix IndexOutOfRangeException here and there in marginal cases (when the text is shorter than the pattern)Beveridge
@DmitryBychenko That's exactly what my example does: RegEx.Escape(pattern) first, followed by replacing the wildcards.Eerie
J
38

You could use the VB.NET Like-Operator:

string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);  

Use CompareMethod.Text if you want to ignore the case.

You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.

Since it's part of the .NET framework and will always be, it's not a problem to use this class.

Jackiejackinoffice answered 18/5, 2015 at 9:43 Comment(6)
hmm, adding "using" results in:Type or namespace name 'CompilerServices' does not exist in namespace 'Microsoft.VisualBasic' (are you missing an assembly reference?Theorbo
You need to add a reference to the Microsoft.VisualBasic.dll: https://mcmap.net/q/205641/-how-to-add-reference-to-microsoft-visualbasic-dllJackiejackinoffice
It appears that this is no longer available in .Net 4.6. :(Successive
@AndrewRondeau Are you sure? May have to update the correct answer in that case, i.e. I'm guessing right now it's a bug waiting to happen for me.Safeguard
I'm using 4.7 and it works fine. There is a note on the website saying it's not supported in .NET Core and .NET Standard projects though.Gagger
It is now supported in .NET Core, version 3.0 onwards: learn.microsoft.com/en-us/dotnet/api/…Unlace
C
34

For those using .NET Core 2.1+ or .NET 5+, you can use the FileSystemName.MatchesSimpleExpression method in the System.IO.Enumeration namespace.

string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);

Both parameters are actually ReadOnlySpan<char> but you can use string arguments too. There's also an overloaded method if you want to turn on/off case matching. It is case insensitive by default as Chris mentioned in the comments.

Clapp answered 3/3, 2021 at 22:3 Comment(4)
Very nice. Thank you for this.Safeguard
In a .NET Framework application you can use the NuGet package Microsoft.IO.Redist and extension method string.AsSpan().Novotny
Is there a Nuget package available for this? I can't seem to find one.Microhenry
Exactly what I needed and works perfectly. Case insensitive by default.Phycomycete
R
21

Using of WildcardPattern from System.Management.Automation may be an option.

pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);

Visual Studio UI may not allow you to add System.Management.Automation assembly to References of your project. Feel free to add it manually, as described here.

Randazzo answered 26/2, 2016 at 15:21 Comment(2)
Although this is a great solution, unfortunately WildCardPattern is not supported by .NET Core apps (it is up to .NET Standard 2.1 / recent Framework though). Further, some user report that System.Management.Automation is not meant to be used directly. I have fallen into the trap of relying on this answer (which works great!) but now I am required to upgrade to .NET Core and I found myself out of luck.Microhenry
I will try this answer for an alternative.Microhenry
W
7

A wildcard * can be translated as .* or .*? regex pattern.

You might need to use a singleline mode to match newline symbols, and in this case, you can use (?s) as part of the regex pattern.

You can set it for the whole or part of the pattern:

X* = > @"X(?s:.*)"
*X = > @"(?s:.*)X"
*X* = > @"(?s).*X.*"
*X*YZ* = > @"(?s).*X.*YZ.*"
X*YZ*P = > @"(?s:X.*YZ.*P)"
Westfall answered 18/5, 2015 at 9:45 Comment(1)
They are not equivalent in every case. E.g., a windows wildcard'd *.htm will also match *.html.Tetrameter
O
5

*X*YZ* = string contains X and contains YZ

@".*X.*YZ"

X*YZ*P = string starts with X, contains YZ and ends with P.

@"^X.*YZ.*P$"
Olivas answered 18/5, 2015 at 9:39 Comment(4)
OK so with regex there's nothing I can simply replace * with to get what I want? These queries are going to be run by users and I don't expect them to understand regular expressions.Safeguard
Yep, but start and end you need to specify anchors. ^ start, $ endOlivas
OK thanks Avinash. It's just replace * with .* for the regex then.Safeguard
This answer really needs more explanations.Limit
S
4

It is necessary to take into consideration, that Regex IsMatch gives true with XYZ, when checking match with Y*. To avoid it, I use "^" anchor

isMatch(str1, "^" + str2.Replace("*", ".*?"));  

So, full code to solve your problem is

bool isMatchStr(string str1, string str2)
{
    string s1 = str1.Replace("*", ".*?");
    string s2 = str2.Replace("*", ".*?");
    bool r1 = Regex.IsMatch(s1, "^" + s2);
    bool r2 = Regex.IsMatch(s2, "^" + s1);
    return r1 || r2;
}
Sixtasixteen answered 22/1, 2017 at 22:19 Comment(4)
Welcome to Stack Overflow! While you may have solved the asker's problem, code-only answers are not very helpful to others who come across this question. Please edit your answer to explain why your code solves the original problem.Brawl
This solution would work if you're simply matching alpanumeric characters and a few others, but it'd fail if you were trying to match any other character that defines the syntax of the regular expression, for example, "/" or "[" just as a couple examples.Deforest
Besides, you should also add $ to the end, so that *.ab does not match foo.abc, and escape the . character itself (and whatever other regular expression characters you may want to use). And why do you first match s1 against s2 and then s2 against s1? Make one parameter the pattern, and the other parameter the matched string.Novotny
Also, why on Earth do you replace * with .*??Novotny
G
3

This is kind of an improvement on the popular answer from @Dmitry Bychenko above (https://mcmap.net/q/203999/-matching-strings-with-wildcard). In order to support ? and * as a matching characters we have to escape them. Use \\? or \\* to escape them.

Also a pre compiled regex will improve the performance (on reuse).

public class WildcardPattern
{
    private readonly string _expression;
    private readonly Regex _regex;

    public WildcardPattern(string pattern)
    {
        if (string.IsNullOrEmpty(pattern)) throw new ArgumentNullException(nameof(pattern));
       
        _expression = "^" + Regex.Escape(pattern)
            .Replace("\\\\\\?","??").Replace("\\?", ".").Replace("??","\\?")
            .Replace("\\\\\\*","**").Replace("\\*", ".*").Replace("**","\\*") + "$";
        _regex = new Regex(_expression, RegexOptions.Compiled);
    }

    public bool IsMatch(string value)
    {
        return _regex.IsMatch(value);
    }
}

usage

new WildcardPattern("Hello *\\**\\?").IsMatch("Hello W*rld?");
new WildcardPattern(@"Hello *\**\?").IsMatch("Hello W*rld?");
Gujral answered 7/8, 2022 at 10:15 Comment(0)
C
0

To support those one with C#+Excel (for partial known WS name) but not only - here's my code with wildcard (ddd*). Briefly: the code gets all WS names and if today's weekday(ddd) matches the first 3 letters of WS name (bool=true) then it turn it to string that gets extracted out of the loop.

using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;

...
string weekDay = DateTime.Now.ToString("ddd*");

Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);

            static String WildCardToRegular(String value)
            {
                return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
            }

            string wsName = null;
            foreach (Worksheet works in sourceWorkbook4.Worksheets)
            {
                Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));

                    if (startsWithddd == true)
                    {
                        wsName = works.Name.ToString();
                    }
            }

            Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);

...
Conchiferous answered 13/5, 2021 at 15:32 Comment(0)
A
-1
public class Wildcard
{
    private readonly string _pattern;

    public Wildcard(string pattern)
    {
        _pattern = pattern;
    }

    public static bool Match(string value, string pattern)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end);
    }

    public static bool Match(string value, string pattern, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end, toLowerTable);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end);
    }

    public bool IsMatch(string str, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str, ref int start, ref int end)
    {
        if (_pattern.Length == 0) return false;
        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;
                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (str[si] == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && str[si] != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }

    public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
    {
        if (_pattern.Length == 0) return false;

        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;

                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    char c = toLowerTable[str[si]];
                    if (c == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && c != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    continue;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }
}
Arthur answered 22/1, 2021 at 4:56 Comment(1)
A code only answer is not very useful. Giving a big piece of code with no explanation of what it does, or why it answers the question is not helpful to anyone. Writing identical code with no explanations on two different questions is not helpful.Epiblast
S
-4

C# Console application sample

Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py

using System;
using System.Text.RegularExpressions;           //Regex

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string cmdLine = String.Join(" ", args);

            bool bFileExtFlag = false;
            int argIndex = 0;
            Regex regex;
            foreach (string s in args)
            {
                //Search for the 1st occurrence of the "*.py" pattern
                regex = new Regex(@"(?s:.*)\056py", RegexOptions.IgnoreCase);
                bFileExtFlag = regex.IsMatch(s);
                if (bFileExtFlag == true)
                    break;
                argIndex++;
            };

            Console.WriteLine("Argument list: " + cmdLine);
            if (bFileExtFlag == true)
                Console.WriteLine("Found python filename: " + args[argIndex]);
            else
                Console.WriteLine("Python file with extension <.py> not found!");
        }


    }
}
Subchloride answered 17/3, 2017 at 15:16 Comment(2)
So you solve an issue with an external application? do you realize how many unrequired resources are wasted?Uranometry
@Uranometry I think we're supposed to analyse the code and figure out what's useful. Anyway I fail to see what this brings over other answers.Limit

© 2022 - 2024 — McMap. All rights reserved.