How to convert an instance of std::string to lower case
Asked Answered
P

31

1006

I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.

Is there an alternative which works 100% of the time?

Posology answered 24/11, 2008 at 11:49 Comment(14)
How else would you convert each element of a list of anything to something else, without iterating through the list? A string is just a list of characters, if you need to apply some function to each character, your going to have to iterate through the string. No way around that.Banff
Why exactly does this question mert down rating? I don't have a problem with iterating through my string, but I am asking if there are other functions apart from tolower(), toupper() etc.Posology
If you have a C style char array, then I guess you may be able to add ox20202020 to each block of 4 characters (provided they are ALL already uppercase) to convert 4 characters to lowercase at a time.Banff
@Dan: If they might already be lowercase, but are definitely A-Z or a-z, you can OR with 0x20 instead of adding. One of those so-smart-it's-probably-dumb optimisations that are almost never worth it...Crystallization
I don't know why it would've been down-voted... certainly it's worded a little oddly (because you do have to iterate through every item somehow), but it's a valid questionWiltz
Note: tolower() doesn't work 100% of the time. Lowercase/uppercase operations only apply to characters, and std::string is essentially an array of bytes, not characters. Plain tolower is nice for ASCII string, but it will not lowercase a latin-1 or utf-8 string correctly. You must know string's encoding and probably decode it before you can lowercase its characters.Landbert
When I type questions I just tend to dump what is in my mental buffer at the time. It doesn't always make sense. ;)Posology
@onebyone: Ah, never thought of that! Well, I never really meant this was a useful way of doing it, just that it's possible. Actually, I'd be more interested int rying soemthing like that on large texts on a GPU, just for a laugh.Banff
This is a good question. Most scripting languages handle it just the way you would expect it to be handled.Tedda
Note that the answer you selected potentially has undefined behaviour. Despite all the up-votes, it is unsafe.Chu
I think what is meant by "iterating over each character" is "explicitly iterating over each character", such as to reduce code bloat, or verbose code.Uprear
After reading through all these answers and back-and-forth comments, I'm not so certain that this is something you'd want to directly deal with inside your program. You may want to use a standalone module that takes strings and encoding/locale arguments and gives only a good result if it can be verifiably converted, which seems to require using the ICU library for maximum robustness. Alternatively, you can always play it even safer and remove the requirement for using case-checks as verification unless the app's entire point is getting those letters to lower-case.Athanor
DevSolar gives an excellent answer which contains a very good example of why this can't be solved as a pure software exercise. He seems to agree as well as disagree with me on this and apparently won't include that you must be aware of cultural changes for any solution to work. It cannot be solved perfectly for all time in all cases.Opec
I would not expect in an object-oriented language to be forced to dig into the object to manipulate its inner elements. When I call std::string.clear() I don't have to cycle through inner elements and clear one of them at a time.Muttonhead
F
1121

Adapted from Not So Frequently Asked Questions:

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
    [](unsigned char c){ return std::tolower(c); });

You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.

If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:

char asciitolower(char in) {
    if (in <= 'Z' && in >= 'A')
        return in - ('Z' - 'z');
    return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.

Frottage answered 24/11, 2008 at 11:59 Comment(41)
That is amazing, ive always wondered what the best way to do it. I had no idea to use std::transform. :)Shiverick
uberjumper: There's actually a whole lot of overhead associated with the STL calls, especially for small"ish" strings. Solutions using a for loop and tolower are probably much faster.Frottage
(Old it may be, the algorithms in question have changed little) @Stefan Mai: What kind of "whole lot of overhead" is there in calling STL algorithms? The functions are rather lean (i.e. simple for loops) and often inlined as you rarely have many calls to the same function with the same template parameters in the same compile unit.Gosh
@eq Fair point, my benchmarks agree with you when compiling with -O3 (though the STL actually outperforms the more hand-tuned code so I'm wondering whether the compiler is pulling some tricks). Debugging STL code is still a bear though ;).Frottage
This non portable solution could be faster. You can avoid branch it this way: inChar |= 0x20. I think it is the fastest way to convert ascii upper to lower. If u want to convert lower to upper then: inChar &= ~0x20.Xylem
@MichalW This works if you have only letters, which isn't always the case. If you're in that realm, you can probably do even better by using bitmasks on longs -- take on 8 characters at a time ;)Frottage
Every time you assume characters are ASCII, God kills a kitten. :(Victualage
Your first example potentially has undefined behaviour (passing char to ::tolower(int).) You need to ensure you don't pass a negative value.Chu
While this should would be the canonical way to do this in a sane world, it has too many problems to recommend it. First, tolower from ctype.h doesn't work with unicode. Secondly, locale.h which is included by many of the other std library headers, defines a conflicting tolower, that causes headaches, see https://mcmap.net/q/54349/-why-can-39-t-quot-transform-s-begin-s-end-s-begin-tolower-quot-be-complied-successfully/339595. It is best to use std::locale or boost::locale::to_lower as other answers suggest.Hindsight
::towlower if you're being international/using wide charsMoneywort
@MichalW Hey, can you explain what you wrote there? Also, why do we use :: in ::tolower ?Quartered
@StefanMai Hi. Why is the "::" needed before "tolower"? I don't understand that.Db
Note that this works for Unicode if you're using a std::u32string and your C locale is compatible with Unicode.Figured
The :: is needed before tolower to indicate that it is in the outermost namespace. If you use this code in another namespace, there may be a different (possibly unrelated) definition of tolower which would end up being preferentially selected without the ::.Circulation
std::transform(data.begin(), data.end(), data.begin(), easytolower); is dangerous. Since the behavior of std::tolower undefined if the input is not representable as unsigned char and is not equal to EOFCarin
@BrianGordon - But its much easier, and there really are way too many cats in the world already.Giraffe
@BrianGordon That is blatantly false, as proven by the fact that there are still kittens in the world! =)Hypothesis
What makes the 2nd solution non-portable? Can I just do this? pastebin.com/MPRMpQJSPishogue
@BrianGordon there are also cases when you know that the input is ASCII (e.g. the wire format of domain names).Palila
@Palila I didn't know that. How does DNS handle international domain names which can be in unicode?Victualage
@BrianGordon applications have to convert them into an all-ASCII encoding called "Punycode" (RFC 3492)Palila
@TypicalHog: Because there is no guarantee that 'A' to 'Z' is a continuous range (EBCDIC); but more importantly because there are letters outside that range ('Ü', 'á', ...). It's very, very sad that the authors prefer to harvest more upvotes for answers with non-portable solutions instead of properly pointing out their shortcomings...Pearlinepearlman
@DevSolar: easytolower seems a perfectly valid solution for latin ASCII symbols to me. Going to use it for normalizing HTML tag names.Dantedanton
@Cheersandhth.-Alf c99 doesn't mention that it's UB: it either returns lower char, or unmodified. std::tolower, however, mentions ubGoogly
@L.F. I fixed your fix.Towill
@Towill To be honest, I have always been having trouble understanding why the char has to be converted unsigned char first. Isn't the value of a (signed) char supposed to be nonnegative, anyway? What is the point of tolowering a negative char? I guess I am missing the point, so would you mind explaining it to be a little bit please :)Cupellation
@L.F. No, char can be analogous to signed char, and a signed char can be negative. tolower only accepts unsigned char and -1. Anything outside its domain is UB, and you don't want to conflate with -1 either. While all members of the basic execution character set are non-negative, that does not necessarily hold for the (complete) execution character set. See the current draft.Towill
@Towill Thank you! I didn't know a char can validly be negative. But then, doesn't converting to unsigned char just change the value?Cupellation
@L.F. char -> unsigned char (value-preserving, modulo 2**CHAR_BIT) -> implicit to int (value-preserving). Of course, if sizeof(int) == 1, things pretty much fall apart.Towill
@Towill OK ... I think I missed that ... Then the int is converted to char, I think, so the resulting value is implementation-defined before C++20 and guaranteed to be the original value since C++20?Cupellation
@L.F. Converting the result from tolower() (int) back to char is also an interesting story, yes.Towill
I don't understand why the tolower here is wrapped in a lambda rather than just passing it to transform on its own.Chyou
@Chyou 1) to make sure that the character is first converted to unsigned char (see Deduplicator's comments above); 2) to enable overload resolution to select the int tolower( int ch ); overload defined in <cctype> instead of the template< class charT > charT tolower( charT ch, const locale& loc ); overload defined in <clocale>.Cupellation
happily coding in Java and the time comes to switch over to a CPP module... comes along a simple string case issue Me: "I'll just look up the std::string toLower() or whatever the standard has for normalizing text case... Hmm, I wonder how they handle all the encoding and localization complexities a 'simple' task like that could entail when std::string is just raw text data?" finds this question... sad requiring that ingest data follows a case convention noisesEachern
I don't think you need to wrap std::tolower in a lambda.Parasitism
@ccj yeah, the distinct lack of "normal" library functions when I started doing C++ was quite disturbingOudh
@Cheersandhth.-Alf what is "UB" in "...it's UB for non-ASCII input."?Alkylation
@Milan: The answer has been edited in July 2019 to remove the original problem, by replacing char with unsigned char. For that original problem, cppreference notes about std::tolower: ❝If the value of ch is not representable as unsigned char and does not equal EOF, the behavior is undefined❞. And since most all C++ compilers have char as a signed type by default, any non-ASCII character is in practice encoded with one or more negative char values, which if used directly as argument to std::tolower will encounter the quoted UB. Conversion to unsigned char avoids that problem.Aileenailene
@Cheersandhth.-Alf Thanks for your response. Out of curiosity, what is the full form of 'UB'?Alkylation
@Milan: Undefined Behavior. eel.is/c++draft/intro.defs#defns.undefined en.cppreference.com/w/cpp/language/ubAileenailene
Visual studio 2019 refused to compile this because of an int to char conversion (warning treated as error). I had to use: std::transform(data.begin(), data.end(), data.begin(), [](const char c) {return static_cast<char>(std::tolower(c)); }); to solve the problem.Choroiditis
N
373

Boost provides a string algorithm for this:

#include <boost/algorithm/string.hpp>

std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str

Or, for non-in-place:

#include <boost/algorithm/string.hpp>

const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
Neptunian answered 24/11, 2008 at 11:57 Comment(8)
Fails for non-ASCII-7.Pearlinepearlman
This is pretty slow, see this benchmark: godbolt.org/z/neM5jsva1Gladiatorial
@Gladiatorial slow? Well, slow is to debug code because your own implementation has a bug because it was more complicated than to just call the boost library ;) If the code is critical, like called a lot and provides a bottleneck, then, well, it can be worth to think about slownessUta
I believe boost isn't C++ standard library solution, isn't it?Graphophone
No, it isn't. It's one of these extremely unfortunate answers you see on EVERY SINGLE C++ question on this website... because adding an entire library just to do something so simple is apparently the most popular route!Diabolism
Unfortunately if you know Unicode you know that you need a library to do it correctly. But this doesn't mean boost is the one, because it also requires ICU. Welcome to transitive dependency monsters (and ICU has very unstable ABI to make it worse).Marceline
I find this answer helpful as I already have Boost in my project, and I do need the non-in-place version to_lowerExternalization
Not everyone uses Boost.Chaucerian
P
348

tl;dr

Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.


First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)

If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.

As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).

So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.

Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?

So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.

(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)

While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.

And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)

So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    /*                          "Odysseus" */
    char const * someString = u8"ΟΔΥΣΣΕΥΣ";
    icu::UnicodeString someUString( someString, "UTF-8" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale,
    // which *does* make a difference (see ı vs. i above).
    std::cout << someUString.toLower( "el_GR" ) << "\n";
    std::cout << someUString.toUpper( "el_GR" ) << "\n";
    return 0;
}

Compile (with G++ in this example):

g++ -Wall example.cpp -licuuc -licuio

This gives:

ὀδυσσεύς

Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.

Pearlinepearlman answered 5/6, 2014 at 15:6 Comment(11)
This is the correct answer in the general case. The standard gives nothing for handling anything except "ASCII" except lies and deception. It makes you think you can maybe deal with maybe UTF-16, but you can't. As this answer says, you cannot get the proper character-length (not byte-length) of a UTF-16 string without doing your own unicode handling. If you have to deal with real text, use ICU. Thanks, @PearlinepearlmanManaging
Is ICU available by default on Ubuntu/Windows or needs to be install separately? Also how about this answer:https://mcmap.net/q/53206/-how-to-convert-an-instance-of-std-string-to-lower-case?Eddi
icu::UnicodeString::length() is technically also lying to you (although less frequently), as it reports the number of 16bit code units rather than the number of code points. ;-)Ununa
@masaers: To be completely fair, with things like combining characters, zero-width joiners and right-to-left markers, the number of code points is rather meaningless. I will remove that remark.Pearlinepearlman
@Pearlinepearlman Agreed! The concept of length is rather meaningless on text (we could add ligatures to the list of offenders). That said, since people are used to tabs and control chars taking up one length unit, code points would be the more intuitive measure. Oh, and thanks for giving the correct answer, sad to see it so far down :-(Ununa
Actually, std::string not being aware that it contains text in a multi-byte character-encoding is a feature, not a bug. It's the only sane way to do it, which is why just about everyone does it. Not having proper standard apis for handling anything but basic text from days gone by which never really were at all is a problem though, yes. It would have to be optional even in a hosted environment though, as it is quite hefty, and there are many cases where it isn't needed.Towill
@Deduplicator: Sorry, but that's just dodging it in all possible ways. There are standards (Unicode), there are quasi-standard APIs for handling it (ICU), and if your intention is to write code that properly converts text to lowercase, unless you can guarantee your code will only ever see ASCII-7 (which would be a rather special case), all the other "solutions" here are 80--20 at best.Pearlinepearlman
That is why there should be such standard APIs. Doesn't negate the fact that much string-manipulation is best done ignoring all but it being a sequence of code-units. And that many use-cases never need anything more sophisticated.Towill
@Towill And that standard API is currently the ICU library, which is what this answer is about.Pearlinepearlman
@Towill I heard that std::text is underway, perhaps even in time for C++23. Let's not give up all hope yet.Pearlinepearlman
icu::UnicodeString seem to be a good class. QString also can do the job. However it is a pain to use in big programs with many libraries. I hope std::text will be a real thing soonConclave
F
39

Using range-based for loop of C++11 a simpler code would be :

#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";

 for(auto elem : str)
    std::cout << std::tolower(elem,loc);
}
Fromenty answered 9/10, 2013 at 8:0 Comment(4)
However, on a french machine, this program doesn't convert non ASCII characters allowed in the french language. For instance a string 'Test String123. É Ï\n' will be converted to : 'test string123. É Ï\n' although characters É Ï and their lower case couterparts 'é' and 'ï', are allowed in french. It seems that no solution for that was provided by other messages of this thread.Fromenty
I think you need to set a proper locale for that.Overpower
@incises, this then someone posted an answer about ICU and that's certainly the way to go. Easier than most other solutions that would attempt to understand the locale.Avaria
I'd prefer to not use external libraries when possible, personally.Athanor
A
35

Another approach using range based for loop with reference variable

string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

cout<<test<<endl;
Astrograph answered 10/1, 2017 at 19:53 Comment(1)
I guess it won't work for UTF-8, will it?Tracheid
S
33

If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html

Snatchy answered 10/10, 2012 at 7:24 Comment(1)
A working example?Inaction
T
28

This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}
Tasiatasiana answered 28/3, 2013 at 6:25 Comment(2)
This did not resize Ä into ä for meSancho
Could also use a back inserter iterator here instead of manual resize.Whine
L
10

Simplest way to convert string into loweercase without bothering about std namespace is as follows

1:string with/without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

2:string without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}
Lindi answered 12/6, 2015 at 6:50 Comment(2)
This is plain wrong: if you check the documentation, you will see that std::tolower cannot work with char, it only supports unsigned char. So this code is UB if str contains characters outside of 0x00-0x7F.Bolitho
This is also false by virtue of using an identifier starting with str in the global namespace, which is strictly reserved.Thunderclap
O
7

I wrote this simple helper function:

#include <locale> // tolower

string to_lower(string s) {        
    for(char &c : s)
        c = tolower(c);
    return s;
}

Usage:

string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
Orthodontist answered 29/9, 2020 at 22:52 Comment(0)
V
6

My own template functions which performs upper / lower case.

#include <string>
#include <algorithm>

//
//  Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::tolower(v)); });
    return s2;
}

//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::toupper(v)); });
    return s2;
}
Vocal answered 18/5, 2019 at 14:40 Comment(2)
This is what I needed. I just used the towlower for wide characters which supports the UTF-16.Duvalier
::tolower and ::toupper are needed instead of tolower and toupperAleen
C
5

std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page

#include <locale>
#include <iostream>

int main () {
  std::locale::global(std::locale("en_US.utf8"));
  std::wcout.imbue(std::locale());
  std::wcout << "In US English UTF-8 locale:\n";
  auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
  std::wstring str = L"HELLo, wORLD!";
  std::wcout << "Lowercase form of the string '" << str << "' is ";
  f.tolower(&str[0], &str[0] + str.size());
  std::wcout << "'" << str << "'\n";
}
Chablis answered 29/1, 2016 at 2:25 Comment(6)
Nice, as long as you can convert the characters in place. What if your source string is const? That seems to make it a bit more messy (e.g. it doesn't look like you can use f.tolower() ), since you need to put the characters in a new string. Would you use transform() and something like std::bind1st( std::mem_fun() ) for the operator?Abbie
For a const string, we can just make a local copy and then convert it in place.Chablis
Yeah, though, making a copy adds more overhead.Abbie
You could use std::transform with the version of ctype::tolower that does not take pointers. Use a back inserter iterator adapter and you don't even need to worry about pre-sizing your output string.Whine
Great, especially because in libstdc++'s tolower with locale parameter, the implicit call to use_facet appears to be a performance bottleneck. One of my coworkers has achieved a several 100% speed increase by replacing boost::iequals (which has this problem) with a version where use_facet is only called once outside of the loop.Kinslow
This won't work in Windows where you'd have to call std::locale("English_Unites States.UTF8").Bolitho
J
4

An alternative to Boost is POCO (pocoproject.org).

POCO provides two variants:

  1. The first variant makes a copy without altering the original string.
  2. The second variant changes the original string in place.
    "In Place" versions always have "InPlace" in the name.

Both versions are demonstrated below:

#include "Poco/String.h"
using namespace Poco;

std::string hello("Stack Overflow!");

// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));

// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
Jamijamie answered 18/9, 2013 at 20:20 Comment(0)
C
4

Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.

To modify the string in-place:

str |= action::transform([](unsigned char c){ return std::tolower(c); });

To generate a new string:

auto new_string = original_string
    | view::transform([](unsigned char c){ return std::tolower(c); });

(Don't forget to #include <cctype> and the required Ranges headers.)

Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:

Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:

char my_tolower(char ch)
{
    return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}

Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:

std::string str_tolower(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), 
                // static_cast<int(*)(int)>(std::tolower)         // wrong
                // [](int c){ return std::tolower(c); }           // wrong
                // [](char c){ return std::tolower(c); }          // wrong
                   [](unsigned char c){ return std::tolower(c); } // correct
                  );
    return s;
}
Cupellation answered 15/4, 2019 at 9:36 Comment(0)
S
3

On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx

// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>

int main( void )
{
   char string[100] = "The String to End All Strings!";
   char * copy1 = _strdup( string ); // make two copies
   char * copy2 = _strdup( string );

   _strlwr( copy1 ); // C4996
   _strupr( copy2 ); // C4996

   printf( "Mixed: %s\n", string );
   printf( "Lower: %s\n", copy1 );
   printf( "Upper: %s\n", copy2 );

   free( copy1 );
   free( copy2 );
}
Seaden answered 29/8, 2014 at 17:18 Comment(0)
A
2

There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.

Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.

Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.

enter image description here

The code looks like this...

#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap()  {
    for (int i = 0; i < sizeof(UtoL); i++)  {
        if (isupper(i)) {
            UtoL[i] = (char)(i + 32);
        }   else    {
            UtoL[i] = i;
        }
    }
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
    char *p = szMyStr;
    // do conversion in-place so as not to require a destination buffer
    while (*p) {        // szMyStr must be null-terminated
        *p = UtoL[*p];  
        p++;
    }
    return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
    time_t start;
    char *Lowered, Upper[128];
    InitUtoLMap();
    strcpy(Upper, "Every GOOD boy does FINE!");

    Lowered = LowerStr(Upper);
    return 0;
}

This approach will, at the same time, allow you to remap any other characters you wish to change.

This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.

Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.

Ambrogino answered 8/1, 2014 at 17:48 Comment(3)
"There is a way to convert upper case to lower WITHOUT doing if tests" ever heard of lookup tables?Dithyramb
Undefined behavior for negative chars.Proliferation
Modern CPUs are bottlenecked in memory not CPU. Benchmarking would be interesting.Unwonted
S
2

Here's a macro technique if you want something simple:

#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(),  ::toupper); std::transform (x.begin()+1, x.end(),   x.begin()+1,::tolower)

However, note that @AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.

Spirit answered 30/1, 2016 at 21:2 Comment(7)
I'm downvoting this for giving macros when a perfectly good solution exist -- you even give those solutions.Opec
The macro technique means less typing of code for something that one would commonly use a lot in programming. Why not use that? Otherwise, why have macros at all?Spirit
Macros are a legacy from C that's being worked hard on to get rid of. If you want to reduce the amount of typing, use a function or a lambda. void strtoupper(std::string& x) { std::transform (x.begin(), x.end(), x.begin(), ::toupper); }Opec
@Opec As I want to be a better coder, can you provide me any ANSI doc links where any ANSI C++ committees say something to the effect of, "We need to call a meeting to get rid of macros out of C++"? Or some other roadmap?Spirit
No, I can't. Bjarne's stance on the topic has been made pretty clear on several occasions though. Besides, there are plenty of reasons to not use macros in C as well as C++. x could be a valid expression, that just happens to compile correctly but will give completely bogus results because of the macros.Opec
good macros! @Opec macros help us so much... I expect they never get rid of it.Uncoil
@AquariusPower I disagree. I have yet to see a macro that could not have been done better as a template or a lambda.Opec
G
2

Is there an alternative which works 100% of the time?

No

There are several questions you need to ask yourself before choosing a lowercasing method.

  1. How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
  2. What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
  3. What libraries are available?

Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!

Gatian answered 28/1, 2019 at 21:31 Comment(1)
I suggest you look up a number of answers, at the one provided by @DevSolar. He explains in very good detail why only the ICU library is capable of doing text well in C++. It is by the very people who invented and support UTF-8 and other Unicode encodings. It is much more complex than most realize.Gravy
W
2

C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string. A sample code without using any third party library:

#include<iostream>
    
int main(){
    std::string str = std::string("How ARe You");
    for(char &ch : str){
        ch = std::tolower(ch);
    }
    std::cout<<str<<std::endl;
    return 0;
}

For character based operation on string : For every character in string

Winepress answered 17/3, 2019 at 14:35 Comment(0)
P
1
// tolower example (C++)
#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";
  for (std::string::size_type i=0; i<str.length(); ++i)
    std::cout << std::tolower(str[i],loc);
  return 0;
}

For more information: http://www.cplusplus.com/reference/locale/tolower/

Pomace answered 20/3, 2017 at 5:20 Comment(0)
B
1

An explanation of how this solution works:


string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

Explanation:

for(auto& c : test) is a range-based for loop of the kind
for ( range_declaration:range_expression)loop_statement:

  1. range_declaration: auto& c
    Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.

  2. range_expression: test
    The range in this case are the characters of string test.

The characters of the string test are available as a reference inside the for loop through identifier c.

Burchette answered 17/4, 2018 at 12:20 Comment(1)
I don't see the value of adding this as an answer, or as an edit to the linked answer for that matter. If someone needs an explanation of how the range-for loop works, there are multiple resources for that, e.g. stackoverflow.com/questions/35490236. For this question, I think this explanation is just noise - like adding an explanation of how iterators or standard algorithms work for the answers that use std::transform.Columbine
J
1

Try this function :)

string toLowerCase(string str) {

    int str_len = str.length();

    string final_str = "";

    for(int i=0; i<str_len; i++) {

        char character = str[i];

        if(character>=65 && character<=92) {

            final_str += (character+32);

        } else {

            final_str += character;

        }

    }

    return final_str;

}
Jariah answered 19/3, 2020 at 1:12 Comment(1)
This function is slow, shouldn't be used in real-life projects.Gladiatorial
O
1

Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.


#include <exception>
#include <iostream>
#include <codecvt>

// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"

using namespace std;
using namespace unicode;

// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;

std::u32string  in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;

std::u32string  lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;

Output

Je suis là!
je suis là!
Obscenity answered 25/4, 2022 at 13:18 Comment(1)
2022, c++17, again and again you have to visit stackoverflow to check for another version of tolowerPushing
F
0

Use fplus::to_lower_case() from fplus library.

Search to_lower_case in fplus API Search

Example:

fplus::to_lower_case(std::string("ABC")) == std::string("abc");
Fungicide answered 8/5, 2017 at 7:21 Comment(0)
I
0

Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper

Insolent answered 27/5, 2022 at 8:43 Comment(0)
D
0

Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:

    for (auto & str : words) {
        for(auto & ch : str)
            ch = tolower(ch);
    }

Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.

Diacritical answered 27/1, 2023 at 19:41 Comment(0)
H
0

For a different perspective, there is a very common use case which is to perform locale neutral case folding on Unicode strings. For this case, it is possible to get good case folding performance when you realize that the set of foldable characters is finite and relatively small (< 2000 Unicode code points). It happens to work very well with a generated perfect hash (guaranteed zero collisions) can be used to convert every input character to its lowercase equivalent.

With UTF-8, you do have to be conscientious of multi-byte characters and iterate accordingly. However, UTF-8 has fairly simple encoding rules that make this operation efficient.

For more details, including links to the relevant parts of the Unicode standard and a perfect hash generator, see my answer here, to the question How to achieve unicode-agnostic case insensitive comparison in C++.

Horal answered 4/4, 2023 at 19:44 Comment(0)
D
-1

Code Snippet

#include<bits/stdc++.h>
using namespace std;


int main ()
{
    ios::sync_with_stdio(false);

    string str="String Convert\n";

    for(int i=0; i<str.size(); i++)
    {
      str[i] = tolower(str[i]);
    }
    cout<<str<<endl;

    return 0;
}
Directly answered 10/4, 2017 at 19:11 Comment(0)
G
-1

Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).

Facebook's Folly:

void toLowerAscii(char* str, size_t length)

Google's Abseil:

void AsciiStrToLower(std::string* s);
Gladiatorial answered 22/6, 2021 at 9:49 Comment(0)
O
-1

I wrote a templated version that works with any string :

#include <type_traits> // std::decay
#include <ctype.h>    // std::toupper & std::tolower


template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1, 
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------

template<class T, class T2 = 
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::toupper(*t); }


template<class T, class T2 = typename std::decay< typename 
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::tolower(*t); }

Tested with gcc compiler:

#include <iostream>
#include "upove_code.h"

int main()
{

    std::string str1 = "hEllo ";
    char str2 [] = "wOrld";

    ToUpper(str1);
    ToUpper(str2);
    std::cout << str1 << str2 << '\n'; 
    Tolower(str1);
    Tolower(str2);
    std::cout << str1 << str2 << '\n'; 
    return 0;
}

output:

>HELLO WORLD
>
>hello world
Overcome answered 3/2, 2022 at 10:11 Comment(0)
W
-3

This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.

#include <iostream>
#include <string>
using namespace std;

int main()
{
    std::string _input = "lowercasetouppercase";
#if 0
    // My idea is to use the ascii value to convert
    char upperA = 'A';
    char lowerA = 'a';

    cout << (int)upperA << endl; // ASCII value of 'A' -> 65
    cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
    // 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0

    cout << "Input String = " << _input.c_str() << endl;
    for (int i = 0; i < _input.length(); ++i)
    {
        _input[i] -= 32; // To convert lower to upper
#if 0
        _input[i] += 32; // To convert upper to lower
#endif // 0
    }
    cout << "Output String = " << _input.c_str() << endl;

    return 0;
}

Note: if there are special characters then need to be handled using condition check.

Winfordwinfred answered 4/6, 2018 at 2:47 Comment(0)
K
-3

use this code to change case of string in c++.

#include<bits/stdc++.h>

using namespace std;

int main(){
  string a = "sssAAAAAAaaaaDas";
  transform(a.begin(),a.end(),a.begin(),::tolower);
  cout<<a;
}

Kamalakamaria answered 27/5, 2022 at 6:1 Comment(1)
Never recommend using #include <bits/stdc++.h> in an answer on Stack Overflow. You'll get downvoted.Stadler

© 2022 - 2024 — McMap. All rights reserved.