Any way to find word in string without split
Asked Answered
O

5

6

I have some strings:

"rose with ribbon"
"roses in concrete"
"roses on bed"

I have to write a program to find string where preffered word exists

E.g: find string where "on" is, so I need to get only "roses on bed".

I used this code:

foreach (KeyWord key in cKeyWords)
{
    foreach (string word in userWords)
    {
        if (key.keyWord.IndexOf(word) != -1)
        {
            ckeyList.Add(key);
        }
    }
}

but I get all strings because IndexOf finds "on" in all of them.

Is there any other solution to find separate word in string without splitting? Maybe it is possible to use Linq or Regex? but I'm not good at using them so would be nice to have any examples.

Orthopter answered 15/9, 2012 at 17:40 Comment(8)
Why don't you want to split the string?Overslaugh
You could search for " on " with spaces to eliminate the hits you don't want.Stilwell
@Stilwell That won't work if the word is at the start or end of the string.Overslaugh
@gjvdkamp, that wouldn't catch the cases where the strings either start or end with "on", so two more cases to handle.Elwell
That can be remedied by adding a space on both ends of the string before searching, but it is a bit of a hack..Stilwell
@DotNetRookie That has the same issue as IndexOf().Overslaugh
it's a pity that you don't want to split the string (why, btw?). Linq would allow you to do it in a one-liner var linesWithOn=from c in cKeyWords where c.Split(' ').Contains("on") select c;Unique
@Overslaugh don't want to split because I am working with several thousands list of strings and splitting will took a lot of timeOrthopter
T
6

Using regex with \bon\b should do it.

\b is the regex anchor for word boundary, so that regex will match a word boundary immediately followed by on immediately followed by another word boundary.

The following C# example...

string[] sArray = new string[]
    {
        "rose with ribbon",
        "roses on bed",
        "roses in concrete"
    };

Regex re = new Regex("\\bon\\b");
foreach (string s in sArray)
{
    Console.Out.WriteLine("{0} match? {1}", s, re.IsMatch(s));

    Match m = re.Match(s);
    foreach(Group g in m.Groups)
    {
        if (g.Success)
        {
            Console.Out.WriteLine("Match found at position {0}", g.Index);
        }
    }
}

... will generate the following output:

rose with ribbon match? False
roses on bed match? True
    Match found at position 6
roses in concrete match? False
Therein answered 15/9, 2012 at 17:45 Comment(4)
Could you explain what does that regex do?Overslaugh
\b is the regex anchor for word boundary -- so that regex looks for a word boundary followed by on followed by another word boundary. See regular-expressions.info/wordboundaries.html.Therein
I think you should include that in your answer.Overslaugh
can you explain how to use that, no ideaOrthopter
E
1

Yes, By using Regex you can find word in string. Try With,

string regexPattern;

foreach (KeyWord key in cKeyWords)
{
  foreach (string word in userWords)
  {
    regexPattern = string.Format(@"\b{0}\b", System.Text.RegularExpressions.Regex.Escape(word));
    if (System.Text.RegularExpressions.Regex.IsMatch(key.keyWord, regexPattern))
    {
        ckeyList.Add(key);
    }
  }
}

Use ToLower() method on string if you don't want to consider with case sensitive.

 foreach (KeyWord key in cKeyWords)
{
  foreach (string word in userWords)
  {
    regexPattern = string.Format(@"\b{0}\b", System.Text.RegularExpressions.Regex.Escape(word.ToLower()));
    if (System.Text.RegularExpressions.Regex.IsMatch(key.keyWord.ToLower(), regexPattern))
    {
        ckeyList.Add(key);
    }
  }
}
Explicable answered 16/9, 2012 at 7:47 Comment(0)
L
0

Use regular expressions, read this article: http://www.dotnetperls.com/regex-match

And here is another good article to study regex: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

Linguini answered 15/9, 2012 at 17:43 Comment(1)
This doesn't actually answer the question. How would you solve this specific problem using regex?Overslaugh
G
0

The problem is that you're searching for "on" which is found in all strings (ribb*on*, c*on*crete)

You should be searching for " on ".

A better solution would be to parse the strings into arrays of words and iterate over those.

Glinys answered 15/9, 2012 at 17:45 Comment(2)
Including spaces before and after won't work if the word you're looking for appears at the beginning or end of the string. Parsing the strings into words is unnecessary, not to mention non-performant if you have 1000's of long sentences.Therein
As I mentioned in a comment, this won't work if the word is at the start or end of the string. And the question specifically asks about solutions that don't involve splitting the string (for whatever reason).Overslaugh
E
0

In a nutshell, this is what you could do (replacing the appropriate StartsWith and EndsWith for C# String class).

foreach (KeyWord key in cKeyWords)
{
   foreach (string word in userWords)
   {
       if (key.keyWord.IndexOf(" " + word + " ") != -1
          || key.keyWord.StartsWith(word + " ") 
          || key.keyWord.EndsWith(" " + word))
       {
           ckeyList.Add(key);
       }
}
Elwell answered 15/9, 2012 at 17:47 Comment(2)
StartsWith() and EndsWith() don't return an integer.Overslaugh
was a bit lazy so just left the note about correcting them in the answer :) anyway corrected it now.Elwell

© 2022 - 2024 — McMap. All rights reserved.