string.IndexOf search for whole word match
Asked Answered
M

6

9

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be.
Consider the following scenario:

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
            int indx = str.IndexOf("TOTAL");
            string amount = str.Substring(indx + "TOTAL".Length, 10);
            string strAmount = Regex.Replace(amount, "[^.0-9]", "");

            Console.WriteLine(strAmount);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }
}

The output of the above code is:

// 34.37
// Press any key to continue...

The problem is, I don't want SUBTOTAL, but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.

Any advice would be appreciated.

Memoirs answered 26/6, 2014 at 18:4 Comment(1)
str.IndexOf(" TOTAL "); But it's ugly.Gilreath
H
12

You can use Regex

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = Regex.Match(str, @"\WTOTAL\W").Index; // will be 18
Helfant answered 26/6, 2014 at 18:6 Comment(4)
Thanks! That's much cleaner! Who knew there was a ".Index" hanging off of RegEx.Match? :) :) :)Memoirs
A bit ago, there was a post on this answer using a RegEx pattern that returned the number following the exact match for "TOTAL". Did anyone else see it? Anyone care to weigh in on such a pattern?Memoirs
@DJ Are you looking for something like var val = Regex.Match(str, @"\WTOTAL\W\s*([0-9\.]+)").Groups[1].Value;Helfant
WOW! I have got to learn more about RegEx. It seems very powerful, if not very intuitive. Thanks LB!Memoirs
L
6

My method is faster than the accepted answer because it does not use Regex.

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");

public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}
Laclos answered 4/12, 2017 at 17:34 Comment(1)
This is also more flexible as it returns -1 if TOTAL is NOT in the line. The Regex above returns 0.Parlormaid
L
3

You can use word boundaries, \b, and the Match.Index property:

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19

See the C# demo.

The \bTOTAL\b matches TOTAL when it is not enclosed with any other letters, digits or underscores.

If you need to count a word as a whole word if it is enclosed with underscores, use

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;

where (?<![^\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.

If the boundaries are whitespaces or start/end of string use

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;

where (?<!\S) requires start of string or a whitespace immediately on the left, and (?!\S) requires the end of string or a whitespace on the right.

NOTE: \b, (?<!...) and (?!...) are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

Leverage answered 3/12, 2020 at 10:41 Comment(0)
R
2

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):

string pattern = String.Format(@"\b{0}\b", findTxt);
Match mtc = Regex.Match(queryTxt, pattern);
if (mtc.Success)
{
    return mtc.Index;
}
else
    return -1;
Reneta answered 17/2, 2021 at 8:7 Comment(0)
N
0

While this may be a hack that just works for only your example, try

string amount = str.Substring(indx + " TOTAL".Length, 10);

giving an extra space before total. As this will not occur with SUBTOTAL, it should skip over the word you don't want and just look for an isolated TOTAL.

Newmann answered 26/6, 2014 at 18:6 Comment(1)
LOL!!! Why didn't I see that! It is a bit "hacky" but for my example only, it should work. I would really like to see if there is a way to force the whole word match in a more clean approach, but will mark this as the answer if I don't see a more refined answer in a day or so. THANKS MUCH!!! :)Memoirs
P
0

I'd recommend the Regex solution from L.B. too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL?

http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

Phenice answered 26/6, 2014 at 18:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.