Case insensitive std::string.find()
Asked Answered
V

11

90

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().

I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?

Vasectomy answered 30/6, 2010 at 18:28 Comment(3)
There's a Gotw about this very subject : gotw.ca/gotw/029.htmMicco
stristr is not there, but "char *strcasestr(const char *haystack, const char *needle);" is there. Isnt this ok?Vagus
@Nasir, strcasestr is not available under Windows.Benny
S
90

You could use std::search with a custom predicate.

#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;

// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
    my_equal( const std::locale& loc ) : loc_(loc) {}
    bool operator()(charT ch1, charT ch2) {
        return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
    }
private:
    const std::locale& loc_;
};

// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
    typename T::const_iterator it = std::search( str1.begin(), str1.end(), 
        str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
    if ( it != str1.end() ) return it - str1.begin();
    else return -1; // not found
}

int main(int arc, char *argv[]) 
{
    // string test
    std::string str1 = "FIRST HELLO";
    std::string str2 = "hello";
    int f1 = ci_find_substr( str1, str2 );

    // wstring test
    std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
    std::wstring wstr2 = L"привет";
    int f2 = ci_find_substr( wstr1, wstr2 );

    return 0;
}
Spielman answered 30/6, 2010 at 18:35 Comment(6)
Why are you using templates here?Yonina
@rstackhouse, template here is for a support of different char types (char & wchar_t).Spielman
Thanks, Kirill. For those as clueless as I am, insert std::advance( it, offset ); after the declaration of the iterator to start the search from an offset.Halvaard
For those (like me) who are not familiar with templates, can you also post a standard version without templates, without locales? Just for wstring for example @KirillV.Lyadvinsky?Osmose
Does the call to std::toupper actually work for wide characters? Wouldn't you need to call std::towupper?Tweeddale
please add string.find_first_of or wstring.find_first_of . implementationJoerg
S
76

The new C++11 style:

#include <algorithm>
#include <string>
#include <cctype>

/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
  auto it = std::search(
    strHaystack.begin(), strHaystack.end(),
    strNeedle.begin(),   strNeedle.end(),
    [](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
  );
  return (it != strHaystack.end() );
}

Explanation of the std::search can be found on cplusplus.com.

Savoirvivre answered 7/11, 2013 at 15:8 Comment(8)
What if I want to find a char c in a string str using the same function. calling it using findStringIC(str, (string)c) doesnt workToledo
This type of char to string cast does not work, you have to actually create the string object like std::string(1, 'x') See coliru.stacked-crooked.com/a/af4051dd1d15972e If you do this a lot it might worth creating a specific function that does not require creating a new object every time.Savoirvivre
In most cases, it is preferable to use tolower() when doing a case insensitive search. Even Ada changed it to lowercase! There are reasons that Unicode.org probably explains somewhere but I do not know exactly why.Rhodia
Upper case is better msdn.microsoft.com/en-us/library/bb386042.aspx but of course not perfect. If you need Turkish, that's going to be hard https://mcmap.net/q/74456/-upper-vs-lower-case and haacked.com/archive/2012/07/05/…Savoirvivre
... did they do away with templates in C++11? I must have missed the memo :)Decaliter
No template needed in this case. For C++17 you might want to take a look at string_view instead of std::string skebanga.github.io/string-viewSavoirvivre
That was a great read on string_view! Something new and shiny, and fast! :)Morpheus
Absolute bananas that I have to write my own function for this in C++. Why isn't this kind of stuff part of standard library? It very much is in C#, Java and Python. And C++ is out there, right there together with those 3 titans.Lambaste
L
21

why not use Boost.StringAlgo:

#include <boost/algorithm/string/find.hpp>

bool Foo()
{
   //case insensitive find

   std::string str("Hello");

   boost::iterator_range<std::string::const_iterator> rng;

   rng = boost::ifind_first(str, std::string("EL"));

   return rng;
}
Lukasz answered 4/11, 2014 at 11:22 Comment(1)
Typically, unless a C++ question is tagged for Boost, it's assumed Boost isn't an option.Morpheus
O
20

Why not just convert both strings to lowercase before you call find()?

tolower

Notice:

Outstretch answered 30/6, 2010 at 18:34 Comment(3)
Because it is very inefficient for larger strings.Trigg
This is also not really a good idea if your software ever needs to be localized. See Turkey test: haacked.com/archive/2012/07/05/…Abeyant
The arguments you'll uncover for doing basic upcase and downcase operations in C++ on anything not encoded as ANSI will overwhelm you xD Simply put, it's not trivial for the standard library to handle as of C++17.Morpheus
J
9

Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.

Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).

std::string upper_string(const std::string& str)
{
    string upper;
    transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
    return upper;
}

std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
    return upper(str).find(upper(substr) );
}

This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.

Additionally, I need to support std::wstring/wchar_t. Any ideas?

tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).

[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.

Jingo answered 30/6, 2010 at 18:41 Comment(1)
the answer i was looking for, thanksTandem
R
2

If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.

Rowenarowland answered 30/6, 2010 at 18:58 Comment(0)
F
1

Also make sense to provide Boost version: This will modify original strings.

#include <boost/algorithm/string.hpp>

string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)

if (str1.find(str2) != std::string::npos)
{
    // str1 contains str2
}

or using perfect boost xpression library

#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
    cout << word << " found!" << endl;
}

In this example you should pay attention that your search word don't have any regex special characters.

Forgiven answered 31/12, 2013 at 12:15 Comment(1)
"... I have found various answers and most suggest using Boost which is not an option in my case".Ainslee
B
0
#include <iostream>
using namespace std;

template <typename charT>
struct ichar {
    operator charT() const { return toupper(x); }
    charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }

int main()
{
    string s = "The STRING";
    wstring ws = L"The WSTRING";
    cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr"))  << endl;
}

A little bit dirty, but short & fast.

Bare answered 6/8, 2015 at 10:49 Comment(0)
M
0

I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.

It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.

/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
    /* Fail fast if the base string was smaller than what we're looking for */
    if (subString.length() > baseString.length()) 
        return false;

    auto it = std::search(
        baseString.begin(), baseString.end(), subString.begin(), subString.end(),
        [](char ch1, char ch2)
        {
            return std::toupper(ch1) == std::toupper(ch2);
        }
    );

    if(it == baseString.end())
        return false;

    size_t match_start_offset = it - baseString.begin();

    std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);

    /* Typical special characters and whitespace to split the substring up. */
    size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");

    /* Pass fast if the remainder of the base string where
       the match started is the same length as the substring. */
    if (match_end_pos == std::wstring::npos && match_start.length() == subString.length()) 
        return true;

    std::wstring extracted_match = match_start.substr(0, match_end_pos);

    return (extracted_match.length() == subString.length());
}
Morpheus answered 27/6, 2018 at 23:34 Comment(2)
The last 3 lines of code should be return (extracted_match.length() == subString.length());Encomiast
"should" might be a bit strong for wording, but I agree that it's an improvement! :) Ty & updated ^_^Morpheus
S
0

The Most Efficient Way

Simple and Fast.

Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)

#include <cstring>
#include <string>
#include <iostream>

int main(void) {

    std::string s1{"abc de fGH"};
    std::string s2{"DE"};

    auto pos = strcasestr(s1.c_str(), s2.c_str());

    if(pos != nullptr)
        std::cout << pos - s1.c_str() << std::endl;

    return 0;
}
Soissons answered 5/12, 2022 at 14:57 Comment(1)
strcasestr seems to be a GNU-only extension to stdlibMaddock
M
-2

wxWidgets has a very rich string API wxString

it can be done with (using the case conversion way)

int Contains(const wxString& SpecProgramName, const wxString& str)
{
  wxString SpecProgramName_ = SpecProgramName.Upper();
  wxString str_ = str.Upper();
  int found = SpecProgramName.Find(str_);
  if (wxNOT_FOUND == found)
  {
    return 0;
  }
  return 1;
}
Magistracy answered 24/12, 2019 at 7:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.