Convert a String In C++ To Upper Case
Asked Answered
C

31

329

How could one convert a string to upper case. The examples I have found from googling only have to deal with chars.

Corbett answered 9/4, 2009 at 17:38 Comment(0)
W
220

Boost string algorithms:

#include <boost/algorithm/string.hpp>
#include <string>

std::string str = "Hello World";

boost::to_upper(str);

std::string newstr = boost::to_upper_copy<std::string>("Hello World");
Weinhardt answered 9/4, 2009 at 17:47 Comment(10)
This also has the benefit of i18n, where ::toupper is most likely assumes ASCII.Caron
Your last line does not compile - you have to change to something like: std::string newstr(boost::to_upper_copy<std::string>("Hello World"));Here
this should not be the accepted answer since it requires boost, or the title should be changed.Unsubstantial
This appears to perform extremely badly with g++ 5.2 -O3, and Boost 1.58 (like 30x worse than calling glibc's toupper in a loop.) There's a dynamic_cast of the locale that doesn't get hoisted out of the per-char loop. See my answer. On the plus side, this may be properly UTF-8 aware, but the slowdown doesn't come from handling UTF-8; it comes from using a dynamic_cast to re-check the locale every character.Abut
yes, i am going to install boost just for to_upper... excellent idea! </sarcasm> :)Combs
Presumably this one is header-only so you don't actually need to build boost, just have the headers.Surcharge
I'm personally ill-disposed towards boost as an answer to "how do I do x in C++?" because boost is not a lightweight solution at all. It seems that you either buy into boost as a framework (or ACE or Qt or Recusion ToolKit++ or ...) or you don't. I'd prefer to see language solutions.Moll
I'd prefer to see language solutions also, but I prefer the more functional way it is done here over the begin/end mess you get with the stl solutions (even though I generally love stl)Elfin
@Moll boost isn't a framework. It's component-based library, built same way as standard components (maximally header-based, component are maximally isolated).boost origin of multiple currently standard components. The problem is that subject is unsolvable in platform-independent way (C++ standard works only for ASCII), so you either would build own library that works for all platforms you target or you would use existing one.Lyle
This question and this answer is a manifest to how poor the STL string class is.Rigorous
A
586
#include <algorithm>
#include <string>

std::string str = "Hello World";
std::transform(str.begin(), str.end(), str.begin(), ::toupper);
Abbey answered 9/4, 2009 at 17:41 Comment(16)
Actually, toupper() can be implemented as a macro. This may cause an issue.Urtication
Good point dirk (unfortunately). Otherwise I think this is certainly the cleanest and clearest way.Supervise
I haven't checked, but doesn't C++ require that these functions are implemented as actual functions, even when C allowed them to be macros?Cockeyed
I believe C required there to be functions also, in case you wanted to take the address of the function or whatever, but I don't have a reference handy.Penance
Updated my post with a quote from the recent draft. This solution has two perils -- so please beware.Urtication
a bind(::toupper, construct<unsigned char>(_1)) with boost.lambda will serve perfectly fine i think.Slowpoke
i've corrected the quotes, thinking that's quite non-controversial.Slowpoke
You can easily guarantee that toupper() won't be called as a macro. Look here, at the end of the 9.1.1 subsection: publications.gbdirect.co.uk/c_book/chapter9/introduction.html.Lizabethlizard
This approach works fine for ASCII, but fails for multi-byte character encodings, or for special casing rules like German 'ß'.Libb
I changed the accepted answer to the one using the boost libraries, because it was faster (in my informal testing), easier to use, and doesn't have the the problems associated with this solution. Still a good solution for instances where boost can't be used.Corbett
I can't get why does compiler reject this code without :: qualifier before toupper. Any ideas?Fra
std::toupper is guaranteed not to be a macro (#include <cstring>)Upshot
This code is incorrect, toupper requires an unsigned char as input. See dirkgently's anser below.Tabular
For late readers: Since C++11, we could use a lambda: [](auto c) { return std::toupper(c); } – maybe worth an update of the answer?Carafe
In VS2019 C++ this is giving a warning internally in algorithm's transform template: C4244 '=': conversion from 'int' to 'char', possible loss of data because toupper() returns int and not char.Demount
std::transform(str.begin(), str.end(), str.begin(), ::toupper); should be std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { return std::toupper(c); }); ... caveat being this is pretty much an ASCII only solution.Iridescence
W
220

Boost string algorithms:

#include <boost/algorithm/string.hpp>
#include <string>

std::string str = "Hello World";

boost::to_upper(str);

std::string newstr = boost::to_upper_copy<std::string>("Hello World");
Weinhardt answered 9/4, 2009 at 17:47 Comment(10)
This also has the benefit of i18n, where ::toupper is most likely assumes ASCII.Caron
Your last line does not compile - you have to change to something like: std::string newstr(boost::to_upper_copy<std::string>("Hello World"));Here
this should not be the accepted answer since it requires boost, or the title should be changed.Unsubstantial
This appears to perform extremely badly with g++ 5.2 -O3, and Boost 1.58 (like 30x worse than calling glibc's toupper in a loop.) There's a dynamic_cast of the locale that doesn't get hoisted out of the per-char loop. See my answer. On the plus side, this may be properly UTF-8 aware, but the slowdown doesn't come from handling UTF-8; it comes from using a dynamic_cast to re-check the locale every character.Abut
yes, i am going to install boost just for to_upper... excellent idea! </sarcasm> :)Combs
Presumably this one is header-only so you don't actually need to build boost, just have the headers.Surcharge
I'm personally ill-disposed towards boost as an answer to "how do I do x in C++?" because boost is not a lightweight solution at all. It seems that you either buy into boost as a framework (or ACE or Qt or Recusion ToolKit++ or ...) or you don't. I'd prefer to see language solutions.Moll
I'd prefer to see language solutions also, but I prefer the more functional way it is done here over the begin/end mess you get with the stl solutions (even though I generally love stl)Elfin
@Moll boost isn't a framework. It's component-based library, built same way as standard components (maximally header-based, component are maximally isolated).boost origin of multiple currently standard components. The problem is that subject is unsolvable in platform-independent way (C++ standard works only for ASCII), so you either would build own library that works for all platforms you target or you would use existing one.Lyle
This question and this answer is a manifest to how poor the STL string class is.Rigorous
L
126

Short solution using C++11 and toupper().

for (auto & c: str) c = toupper(c);
Liberec answered 22/7, 2013 at 17:20 Comment(7)
Wouldn't c be of const char type (from auto)? If so, you cannot assign it (because of const part) to what is returned by toupper(c).Lounge
@PolGraphic: Range - based for uses the container's begin() / end() methods to iterate over its contents. std::basic_string has both a const and a mutable iterator (returned by cbegin() and begin() respectively, see std::basic_string::begin), so for(:) uses the one appropriate (cbegin() if str is declared const, with auto =:= const char, begin() otherwise, with auto =:= char).Liberec
See dirkgently's anser below, c needs to be cast to unsigned char for this to be corred.Tabular
boost's to_upper() seems a lot more consistent with c++ STL functions than toupper.Verrucose
Love this - I went with for (auto & c: str) c = (char)toupper(c);Marie
I love this simple solution. It will be quite useful as long as str is not a const.Institutor
@GeePokey: with a C-style cast? NoooooExplain
A
53

This problem is vectorizable with SIMD for the ASCII character set.


Speedup comparisons:

Preliminary testing with x86-64 gcc 5.2 -O3 -march=native on a Core2Duo (Merom). The same string of 120 characters (mixed lowercase and non-lowercase ASCII), converted in a loop 40M times (with no cross-file inlining, so the compiler can't optimize away or hoist any of it out of the loop). Same source and dest buffers, so no malloc overhead or memory/cache effects: data is hot in L1 cache the whole time, and we're purely CPU-bound.

  • boost::to_upper_copy<char*, std::string>(): 198.0s. Yes, Boost 1.58 on Ubuntu 15.10 is really this slow. I profiled and single-stepped the asm in a debugger, and it's really, really bad: there's a dynamic_cast of a locale variable happening per character!!! (dynamic_cast takes multiple calls to strcmp). This happens with LANG=C and with LANG=en_CA.UTF-8.

    I didn't test using a RangeT other than std::string. Maybe the other form of to_upper_copy optimizes better, but I think it will always new/malloc space for the copy, so it's harder to test. Maybe something I did differs from a normal use-case, in a way that stopped g++ from hoisting the locale setup stuff out of the per-character loop. Or maybe this was just always a disaster, at least with that header and GCC version. My loop reading from a std::string and writing to a char dstbuf[4096] makes sense for testing.

  • loop calling glibc toupper: 6.67s (not checking the int result for potential multi-byte UTF-8, though. This matters for some locales, including the common test-case of Turkish.)

  • ASCII-only loop: 8.79s (my baseline version for the results below.) Apparently a table-lookup is faster than a cmov, with the table hot in L1 anyway.

  • ASCII-only auto-vectorized: 2.51s. (120 chars is half way between worst case and best case, see below)

  • ASCII-only manually vectorized: 1.35s

See also this question about toupper() being slow on Windows when a locale is set.


I was shocked that Boost is an order of magnitude slower than the other options. I double-checked that I had -O3 enabled, and even single-stepped the asm to see what it was doing. It's almost exactly the same speed with clang++ 3.8. It has huge overhead inside the per-character loop. The perf record / report result (for the cycles perf event) is:

  32.87%  flipcase-clang-  libstdc++.so.6.0.21   [.] _ZNK10__cxxabiv121__vmi_class_type_info12__do_dyncastElNS_17__class_type_info10__sub_kindEPKS1_PKvS4_S6_RNS1_16
  21.90%  flipcase-clang-  libstdc++.so.6.0.21   [.] __dynamic_cast                                                                                                 
  16.06%  flipcase-clang-  libc-2.21.so          [.] __GI___strcmp_ssse3                                                                                            
   8.16%  flipcase-clang-  libstdc++.so.6.0.21   [.] _ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale                                                                     
   7.84%  flipcase-clang-  flipcase-clang-boost  [.] _Z16strtoupper_boostPcRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE                                   
   2.20%  flipcase-clang-  libstdc++.so.6.0.21   [.] strcmp@plt                                                                                                     
   2.15%  flipcase-clang-  libstdc++.so.6.0.21   [.] __dynamic_cast@plt                                                                                             
   2.14%  flipcase-clang-  libstdc++.so.6.0.21   [.] _ZNKSt6locale2id5_M_idEv                                                                                       
   2.11%  flipcase-clang-  libstdc++.so.6.0.21   [.] _ZNKSt6locale2id5_M_idEv@plt                                                                                   
   2.08%  flipcase-clang-  libstdc++.so.6.0.21   [.] _ZNKSt5ctypeIcE10do_toupperEc                                                                                  
   2.03%  flipcase-clang-  flipcase-clang-boost  [.] _ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale@plt                                                                 
   0.08% ...

Autovectorization

Gcc and clang will only auto-vectorize loops when the iteration count is known ahead of the loop. (i.e. search loops like plain-C implementation of strlen won't autovectorize.)

Thus, for strings small enough to fit in cache, we get a significant speedup for strings ~128 chars long from doing strlen first. This won't be necessary for explicit-length strings (like C++ std::string).

// char, not int, is essential: otherwise gcc unpacks to vectors of int!  Huge slowdown.
char ascii_toupper_char(char c) {
    return ('a' <= c && c <= 'z') ? c^0x20 : c;    // ^ autovectorizes to PXOR: runs on more ports than paddb
}

// gcc can only auto-vectorize loops when the number of iterations is known before the first iteration.  strlen gives us that
size_t strtoupper_autovec(char *dst, const char *src) {
    size_t len = strlen(src);
    for (size_t i=0 ; i<len ; ++i) {
        dst[i] = ascii_toupper_char(src[i]);  // gcc does the vector range check with psubusb / pcmpeqb instead of pcmpgtb
    }
    return len;
}

Any decent libc will have an efficient strlen that's much faster than looping a byte at a time, so separate vectorized strlen and toupper loops are faster.

Baseline: a loop that checks for a terminating 0 on the fly.

Times for 40M iterations, on a Core2 (Merom) 2.4GHz. gcc 5.2 -O3 -march=native. (Ubuntu 15.10). dst != src (so we make a copy), but they don't overlap (and aren't nearby). Both are aligned.

  • 15 char string: baseline: 1.08s. autovec: 1.34s
  • 16 char string: baseline: 1.16s. autovec: 1.52s
  • 127 char string: baseline: 8.91s. autovec: 2.98s // non-vector cleanup has 15 chars to process
  • 128 char string: baseline: 9.00s. autovec: 2.06s
  • 129 char string: baseline: 9.04s. autovec: 2.07s // non-vector cleanup has 1 char to process

Some results are a bit different with clang.

The microbenchmark loop that calls the function is in a separate file. Otherwise it inlines and strlen() gets hoisted out of the loop, and it runs dramatically faster, esp. for 16 char strings (0.187s).

This has the major advantage that gcc can auto-vectorize it for any architecture, but the major disadvantage that it's slower for the usually-common case of small strings.


So there are big speedups, but compiler auto-vectorization doesn't make great code, esp. for cleanup of the last up-to-15 characters.

Manual vectorization with SSE intrinsics:

Based on my case-flip function that inverts the case of every alphabetic character. It takes advantage of the "unsigned compare trick", where you can do low < a && a <= high with a single unsigned comparison by range shifting, so that any value less than low wraps to a value that's greater than high. (This works if low and high aren't too far apart.)

SSE only has a signed compare-greater, but we can still use the "unsigned compare" trick by range-shifting to the bottom of the signed range: Subtract 'a'+128, so the alphabetic characters range from -128 to -128+25 (-128+'z'-'a')

Note that adding 128 and subtracting 128 are the same thing for 8bit integers. There's nowhere for the carry to go, so it's just xor (carryless add), flipping the high bit.

#include <immintrin.h>

__m128i upcase_si128(__m128i src) {
    // The above 2 paragraphs were comments here
    __m128i rangeshift = _mm_sub_epi8(src, _mm_set1_epi8('a'+128));
    __m128i nomodify   = _mm_cmpgt_epi8(rangeshift, _mm_set1_epi8(-128 + 25));  // 0:lower case   -1:anything else (upper case or non-alphabetic).  25 = 'z' - 'a'

    __m128i flip  = _mm_andnot_si128(nomodify, _mm_set1_epi8(0x20));            // 0x20:lcase    0:non-lcase

    // just mask the XOR-mask so elements are XORed with 0 instead of 0x20
    return          _mm_xor_si128(src, flip);
    // it's easier to xor with 0x20 or 0 than to AND with ~0x20 or 0xFF
}

Given this function that works for one vector, we can call it in a loop to process a whole string. Since we're already targeting SSE2, we can do a vectorized end-of-string check at the same time.

We can also do much better for the "cleanup" of the last up-to-15 bytes left over after doing vectors of 16B: upper-casing is idempotent, so re-processing some input bytes is fine. We do an unaligned load of the last 16B of the source, and store it into the dest buffer overlapping the last 16B store from the loop.

The only time this doesn't work is when the whole string is under 16B: Even when dst=src, non-atomic read-modify-write is not the same thing as not touching some bytes at all, and can break multithreaded code.

We have a scalar loop for that, and also to get src aligned. Since we don't know where the terminating 0 will be, an unaligned load from src might cross into the next page and segfault. If we need any bytes in an aligned 16B chunk, it's always safe to load the whole aligned 16B chunk.

Full source: in a github gist.

// FIXME: doesn't always copy the terminating 0.
// microbenchmarks are for this version of the code (with _mm_store in the loop, instead of storeu, for Merom).
size_t strtoupper_sse2(char *dst, const char *src_begin) {
    const char *src = src_begin;
    // scalar until the src pointer is aligned
    while ( (0xf & (uintptr_t)src) && *src ) {
        *(dst++) = ascii_toupper(*(src++));
    }

    if (!*src)
        return src - src_begin;

    // current position (p) is now 16B-aligned, and we're not at the end
    int zero_positions;
    do {
        __m128i sv = _mm_load_si128( (const __m128i*)src );
        // TODO: SSE4.2 PCMPISTRI or PCMPISTRM version to combine the lower-case and '\0' detection?

        __m128i nullcheck = _mm_cmpeq_epi8(_mm_setzero_si128(), sv);
        zero_positions = _mm_movemask_epi8(nullcheck);
        // TODO: unroll so the null-byte check takes less overhead
        if (zero_positions)
            break;

        __m128i upcased = upcase_si128(sv);   // doing this before the loop break lets gcc realize that the constants are still in registers for the unaligned cleanup version.  But it leads to more wasted insns in the early-out case

        _mm_storeu_si128((__m128i*)dst, upcased);
        //_mm_store_si128((__m128i*)dst, upcased);  // for testing on CPUs where storeu is slow
        src += 16;
        dst += 16;
    } while(1);

    // handle the last few bytes.  Options: scalar loop, masked store, or unaligned 16B.
    // rewriting some bytes beyond the end of the string would be easy,
    // but doing a non-atomic read-modify-write outside of the string is not safe.
    // Upcasing is idempotent, so unaligned potentially-overlapping is a good option.

    unsigned int cleanup_bytes = ffs(zero_positions) - 1;  // excluding the trailing null
    const char* last_byte = src + cleanup_bytes;  // points at the terminating '\0'

    // FIXME: copy the terminating 0 when we end at an aligned vector boundary
    // optionally special-case cleanup_bytes == 15: final aligned vector can be used.
    if (cleanup_bytes > 0) {
        if (last_byte - src_begin >= 16) {
            // if src==dest, this load overlaps with the last store:  store-forwarding stall.  Hopefully OOO execution hides it
            __m128i sv = _mm_loadu_si128( (const __m128i*)(last_byte-15) ); // includes the \0
            _mm_storeu_si128((__m128i*)(dst + cleanup_bytes - 15), upcase_si128(sv));
        } else {
            // whole string less than 16B
            // if this is common, try 64b or even 32b cleanup with movq / movd and upcase_si128
#if 1
            for (unsigned int i = 0 ; i <= cleanup_bytes ; ++i) {
                dst[i] = ascii_toupper(src[i]);
            }
#else
            // gcc stupidly auto-vectorizes this, resulting in huge code bloat, but no measurable slowdown because it never runs
            for (int i = cleanup_bytes - 1 ;  i >= 0 ; --i) {
                dst[i] = ascii_toupper(src[i]);
            }
#endif
        }
    }

    return last_byte - src_begin;
}

Times for 40M iterations, on a Core2 (Merom) 2.4GHz. gcc 5.2 -O3 -march=native. (Ubuntu 15.10). dst != src (so we make a copy), but they don't overlap (and aren't nearby). Both are aligned.

  • 15 char string: baseline: 1.08s. autovec: 1.34s. manual: 1.29s
  • 16 char string: baseline: 1.16s. autovec: 1.52s. manual: 0.335s
  • 31 char string: manual: 0.479s
  • 127 char string: baseline: 8.91s. autovec: 2.98s. manual: 0.925s
  • 128 char string: baseline: 9.00s. autovec: 2.06s. manual: 0.931s
  • 129 char string: baseline: 9.04s. autovec: 2.07s. manual: 1.02s

(Actually timed with _mm_store in the loop, not _mm_storeu, because storeu is slower on Merom even when the address is aligned. It's fine on Nehalem and later. I've also left the code as-is for now, instead of fixing the failure to copy the terminating 0 in some cases, because I don't want to re-time everything.)

So for short strings longer than 16B, this is dramatically faster than auto-vectorized. Lengths one-less-than-a-vector-width don't present a problem. They might be a problem when operating in-place, because of a store-forwarding stall. (But note that it's still fine to process our own output, rather than the original input, because toupper is idempotent).

There's a lot of scope for tuning this for different use-cases, depending on what the surrounding code wants, and the target microarchitecture. Getting the compiler to emit nice code for the cleanup portion is tricky. Using ffs(3) (which compiles to bsf or tzcnt on x86) seems to be good, but obviously that bit needs a re-think since I noticed a bug after writing up most of this answer (see the FIXME comments).

Vector speedups for even smaller strings can be obtained with movq or movd loads/stores. Customize as necessary for your use-case.


UTF-8:

We can detect when our vector has any bytes with the high bit set, and in that case fall back to a scalar utf-8-aware loop for that vector. The dst point can advance by a different amount than the src pointer, but once we get back to an aligned src pointer, we'll still just do unaligned vector stores to dst.

For text that's UTF-8, but mostly consists of the ASCII subset of UTF-8, this can be good: high performance in the common case with correct behaviour in all cases. When there's a lot of non-ASCII, it will probably be worse than staying in the scalar UTF-8 aware loop all the time, though.

Making English faster at the expense of other languages is not a future-proof decision if the downside is significant.

SIMD optimized UTF-8 validation and ASCII-only special case detection:


Locale-aware:

In some locales, toupper of an ASCII character produces a non-ASCII character. Turkish (tr_TR) is an example of a locale with many of the weird features, the correct result from toupper('i') is 'İ' (U0130), not 'I' (plain ASCII). See Martin Bonner's comments on a question about tolower() being slow on Windows.

We can also check for an exception-list and fallback to scalar there, like for multi-byte UTF8 input characters.

With this much complexity, SSE4.2 PCMPISTRM or something might be able to do a lot of our checks in one go.

Abut answered 11/5, 2016 at 0:24 Comment(2)
Isn't this over engineered prematurely optimized answer to the OP :)Lisettelisha
@galinette: Do you have some other suggestion where I should have posted it? The original poster of the question is presumably long gone. I wrote this because it was fun, after answering the x86 asm Q&A How to access a char array and change lower case letters to upper case, and vice versa, and this seemed like a good place to post the results. It's only "premature" optimization if someone copy/pastes this into a code-base that wouldn't otherwise need to check if it had the ASCII subset of UTF-8.Abut
U
31
struct convert {
   void operator()(char& c) { c = toupper((unsigned char)c); }
};

// ... 
string uc_str;
for_each(uc_str.begin(), uc_str.end(), convert());

Note: A couple of problems with the top solution:

21.5 Null-terminated sequence utilities

The contents of these headers shall be the same as the Standard C Library headers <ctype.h>, <wctype.h>, <string.h>, <wchar.h>, and <stdlib.h> [...]

  • Which means that the cctype members may well be macros not suitable for direct consumption in standard algorithms.

  • Another problem with the same example is that it does not cast the argument or verify that this is non-negative; this is especially dangerous for systems where plain char is signed. (The reason being: if this is implemented as a macro it will probably use a lookup table and your argument indexes into that table. A negative index will give you UB.)

Urtication answered 9/4, 2009 at 17:42 Comment(14)
The normal cctype members are macros. I remember reading that they also had to be functions, although I don't have a copy of the C90 standard and don't know if it was explicitly stated or not.Penance
they have to be functions in C++ - even if C allows them to be macros. i agree with your second point about the casting though. the top solution could pass negative values and cause UB with that. that's the reason i didn't vote it up (but i didn't vote it down either) :)Slowpoke
@litb: Can you cite a reference, I couldn't find anything to that effect in the standard.Urtication
standard quote must not be missing: 7.4.2.2/1 (poor litb, that's referencing a C99 TC2 draft only), and C++ 17.4.1.2/6 in the glory c++98 standard.Slowpoke
(note the foot-note to it: "This disallows the common practice of providing a masking macro.... blah blupp .. only way to do it in C++ is to provide a extern inline function.") :)Slowpoke
@litb: Footnotes are not part of the normative text, are they? I have had this confusion :PUrtication
you are right, they are not part of the normative text :) but they describe the intent of their authors of course. which means if my cited text isn't really making sure there must not be macros, another paragraph will make it sure. hold on i'll see whether i find it.Slowpoke
well but even if the note has no backing normative text, then there will still be a ::toupper function (beside the macro), because of that normative text i cited. since ::tupper will not be replaced by that macro (parens for the arguments are missing), it will work nicely, the same as in C :)Slowpoke
hmm, i think i quoted the paragraph wrongly. It seems that when it talks about "Standard C++ Library", it means only those "cname" and "name" headers, but excludes those "name.h" headers, which it refers to by "Standard C Library". so ctype.h is not at all affected by that rule. :)Slowpoke
However, D.5/1 seems to contradict. It says "For compatibility with the Standard C library, the C++ Standard library provides the 18 C headers, as shown in Table 100:" this looks like a defect i think. i'll report it.Slowpoke
@litb: Thanks for taking the trouble. Are you co-consulting with C99?Urtication
yeah, the c99 draft defines it as a to-upper function. however, in some earlier chapter, it says the library is free to define a macro in addition. but &function must still yield a valid function address...Slowpoke
... that's achieved by this trickery: stackoverflow.com/questions/650461/…Slowpoke
Actually, in order to force a function call we need to write (toupper) instead of just toupper in the transformUrtication
U
23
string StringToUpper(string strToConvert)
{
   for (std::string::iterator p = strToConvert.begin(); strToConvert.end() != p; ++p)
       *p = toupper(*p);

   return p;
}

Or,

string StringToUpper(string strToConvert)
{
    std::transform(strToConvert.begin(), strToConvert.end(), strToConvert.begin(), ::toupper);

    return strToConvert;
}
Ununa answered 7/3, 2011 at 17:23 Comment(3)
if you don't have access to boost the second solution is probably the best you can get. what do the stars ** after the parameters on the first solution do?Pentarchy
I'm pretty sure the ** is a typo left over from trying to use bold font in the code syntax.Telethon
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
R
22

The following works for me.

#include <algorithm>
void  toUpperCase(std::string& str)
{
    std::transform(str.begin(), str.end(), str.begin(), ::toupper);
}

int main()
{
   std::string str = "hello";
   toUpperCase(&str);
}
Roxana answered 22/1, 2016 at 12:34 Comment(5)
Note that std::transform is defined in <algorithm>Barnaby
Yes. this # include is required, #include <algorithm>Roxana
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
duplicate of answer given by user648545 – -1Monah
@PiotrDobrogost I have no idea about the answer given by user648545. I have not copied that.When I compare two methods the method signature different altogether although both function use library function transform.Roxana
H
21

Do you have ASCII or International characters in strings?

If it's the latter case, "uppercasing" is not that simple, and it depends on the used alphabet. There are bicameral and unicameral alphabets. Only bicameral alphabets have different characters for upper and lower case. Also, there are composite characters, like Latin capital letter 'DZ' (\u01F1 'DZ') which use the so called title case. This means that only the first character (D) gets changed.

I suggest you look into ICU, and difference between Simple and Full Case Mappings. This might help:

http://userguide.icu-project.org/transforms/casemappings

Haar answered 9/4, 2009 at 17:58 Comment(3)
Or the German eszet (sp?), the thing that looks like the Greek letter beta, and means "ss". There is no single German character that means "SS", which is the uppercase equivalent. The German word for "street", when uppercased, gets one character longer.Penance
Another special case is the Greek letter sigma (Σ), which has two lowercase versions, depending on whether it's at the end of a word (ς) or not (σ). And then there are language specific rules, like Turkish having the case mapping I↔ı and İ↔i.Libb
"Uppercasing" is called case folding.Hint
B
16

The faster one if you use only ASCII characters:

for(i=0;str[i]!=0;i++)
  if(str[i]<='z' && str[i]>='a')
    str[i]+='A'-'a';

Please note that this code run faster but only works on ASCII and is not an "abstract" solution.

Extended version for other UTF8 alphabets:

...
if(str[i]<='z' && str[i]>='a') //is latin
    str[i]+='A'-'a';
else if(str[i]<='я' && str[i]>='а') //cyrillic
    str[i]+='Я'-'я'
else if(str[i]<='ω' && str[i]>='α') //greek
    str[i]+='Ω'-'ω'
//etc...

If you need full UNICODE solutions or more conventional and abstract solutions, go for other answers and work with methods of C++ strings.

Blithesome answered 22/8, 2012 at 8:44 Comment(19)
The question is tagged as C++, but you wrote a C answer here. (I'm not one of the downvoters.)Ranjiv
I wrote a C answer AND a C++ answer here beacuse C++ is written to be fully compatible with C sources, so any C solution is also a C++ correct solutionBlithesome
But it is so much better to give an answer which respects C++ way.Lifeblood
The standard c++ way would be to use std::transform with toupper. That is less code and for sure portable. This code rely on the "fact" that the system will use ascii as the character encoding mechanism. Not sure that all the systems are based on this encoding and therefore not sure that this is portable.Coopery
I wrote in bold that it works only in ASCII, and that this is faster, not than this is "standard", didn't I?Blithesome
Why you decided to use ASCII codes instead of characters enclosed in '?Thaumaturge
I would consider using more logical code str[i]+='A'-'a' instead of just 32. Such logic is suitable not only for latinObrian
@Obrian 'A'-'a' in ascii is 32 ok, it would be ok your variant, even better but stating exactly when it works, I cannot correct with a solution that "maybe in some other case works", but not defining when. Can you explicity when it works? for instance is guaranteed in UTF8 and works with special letters with accents?Blithesome
@LucaC.It works with UTF8 in range 0-127. In fact UTF8 is packed UNICODE, so it works for UNICODE. It can be adapted to any alphabet that have upper/lower case. For instance greek 'Ω'-'ω' if range between α and 'ω'. For cyrillic 'Я'-'я' if range between 'а' and 'я' and so onObrian
@Obrian unicode 0-127 is not exactly ascii? I don't remember.. if i change the proposed solution, which state do you suggest instead of "it works only for ascii characters"?Blithesome
0-127 is ascii, 128-255 is extended ascii. But 0-127 is exactly the same in UTF8 and ASCII. The UTF8 range 0-127 is identical to ASCII, no differenceObrian
@Obrian bot so, -=32 work exactly correct, replacing with 'A'-'a' makes no differenceBlithesome
@LucaC. In fact there is a huge difference. First of all, visually -=32 is cryptic, based on magic numbers and meaningless. Any existing clean code convention is against it. But +='A'-'a' (or -='a'-'A') is logical and meaningful. This is a constant expression which compiler detects at compile time and in all the cases generates exactly the same machine code. Also constant 32 is very likely to not work for greek or cyrillic, or any other non latin charset, see my previous comment.Obrian
@Obrian yes, you state it is probably not working, but in fact you does not cite any further sure case in which is working, so no extra value, I was asking for some more case working just evaluate if it functionally give extra value, apart of readability, I could modify the guard >'a' <'z' and amplify the domainBlithesome
@LucaC See my second comment, the exactly way to extend your program for extra casesObrian
@Obrian ok probably I am not enough fluid on UTF8, cyrillic, ecc.. to understand, that variant is working with 'A'-'a' or needs to do, for instance, 'Я'-'я'?Blithesome
@LucaC. A sample if(str[i]<='z' && str[i]>='a') /*is latin*/ str[i]+='A'-'a'; else if if(str[i]<='а' && str[i]>='я') /*is cyrillic*/ str[i]+='Я'-'я' I mean you can use 'А'-'а' for cyrillic but be attentive, that are cyrillyc A. Looks like latin but it is not, and has different codes. And don't forget about extended ASCII, which can have different interpretations.Obrian
@Obrian could it collide since str[i]<='z' && str[i]>='a' and str[i]<='а' && str[i]>='я' are overlapping?Blithesome
Doesn't collide. The second 'а' is not latin, is cyrillic. Is completely different symbol and different code, only accidentally looks identical.Obrian
S
16

Use a lambda.

std::string s("change my case");

std::locale locale;
auto to_upper = [&locale] (char ch) { return std::use_facet<std::ctype<char>>(locale).toupper(ch); };

std::transform(s.begin(), s.end(), s.begin(), to_upper);
Selfsame answered 15/6, 2014 at 2:31 Comment(1)
Byron, don't worry about the other comments. It is quite ok to answer old questions with new (modern) solution as you did.Franciscka
F
15

As long as you are fine with ASCII-only and you can provide a valid pointer to RW memory, there is a simple and very effective one-liner in C:

void strtoupper(char* str)
{ 
    while (*str) *(str++) = toupper((unsigned char)*str);
}

This is especially good for simple strings like ASCII identifiers which you want to normalize into the same character-case. You can then use the buffer to construct a std:string instance.

Finochio answered 8/6, 2011 at 16:4 Comment(2)
One notes that this answer is for a c string rather than a std::stringPerdue
This has an obvious inherent security flaw. I wouldn't do this.Selfsame
C
12
#include <string>
#include <locale>

std::string str = "Hello World!";
auto & f = std::use_facet<std::ctype<char>>(std::locale());
f.toupper(str.data(), str.data() + str.size());

This will perform better than all the answers that use the global toupper function, and is presumably what boost::to_upper is doing underneath.

This is because ::toupper has to look up the locale - because it might've been changed by a different thread - for every invocation, whereas here only the call to locale() has this penalty. And looking up the locale generally involves taking a lock.

This also works with C++98 after you replace the auto, use of the new non-const str.data(), and add a space to break the template closing (">>" to "> >") like this:

std::use_facet<std::ctype<char> > & f = 
    std::use_facet<std::ctype<char> >(std::locale());
f.toupper(const_cast<char *>(str.data()), str.data() + str.size());
Coffey answered 7/10, 2016 at 23:52 Comment(0)
R
10
//works for ASCII -- no clear advantage over what is already posted...

std::string toupper(const std::string & s)
{
    std::string ret(s.size(), char());
    for(unsigned int i = 0; i < s.size(); ++i)
        ret[i] = (s[i] <= 'z' && s[i] >= 'a') ? s[i]-('a'-'A') : s[i];
    return ret;
}
Rodrigo answered 1/8, 2010 at 4:24 Comment(3)
s.size() is of type std::size_t which, AFAIK could very well be unsigned int depending on the implementationWrote
I don't think there are any modern implementations in which the result of std::string::size is signed. Given that, both semantically and practically, there's no such thing as a negative size, I'm going to go with size_t being at least a 32-bit unsigned integer.Publicly
There's no reason not to write for (size_t i = 0 .... There's also no good reason to make it so hard to read. This also copies the string first and then loop over it. @Luke's answer is better in some ways, except for not taking advantage of 'a' character constants.Abut
B
9
std::string str = "STriNg oF mIxID CasE lETteRS"

C++ 11

  • Using for_each

    std::for_each(str.begin(), str.end(), [](char & c){ c = ::toupper(c); });

  • Using transform

    std::transform(str.begin(), str.end(), str.begin(), ::toupper);

C++ (Windows Only)

_strupr_s(str, str.length());

C++ (Using Boost Library)

boost::to_upper_copy(str)
Brennabrennan answered 15/9, 2020 at 19:39 Comment(0)
D
8
typedef std::string::value_type char_t;

char_t up_char( char_t ch )
{
    return std::use_facet< std::ctype< char_t > >( std::locale() ).toupper( ch );
}

std::string toupper( const std::string &src )
{
    std::string result;
    std::transform( src.begin(), src.end(), std::back_inserter( result ), up_char );
    return result;
}

const std::string src  = "test test TEST";

std::cout << toupper( src );
Dogger answered 9/4, 2009 at 17:55 Comment(4)
wouldnt recommend a back_inserter as you already know the length; use std::string result(src.size()); std::transform( src.begin(), src.end(), result.begin(), up_char );Palpate
Altough I am sure you know this.Palpate
@Viktor Sehr, @bayda: I know this is 2 years old, but why not get the best of both worlds. Use reserve and back_inserter (making so the string is only copied once). inline std::string to_lower(const std::string &s) { std::string result; result.reserve(s.size()); std::transform(s.begin(), s.end(), std::back_inserter( result ), static_cast<int(*)(int)>(std::tolower)); return result; }Blowhard
For those who is looking for conversion from the Win32 console input codepage: use instead of std::locale() this one: std::locale(std::string(".") + std::to_string(GetConsoleCP())). In mine case the executable increased in twice in size under MSVC 2015 Update 3. If try to use only the std::locale() + facet, then executable increases on + ~60KB in Release, in case of std::locale with string constructor - increases on + ~200KB. So be careful with that.Teddy
R
7

The answer of @dirkgently is very inspiring, but I want to emphasize that due to the concern as is shown below,

Like all other functions from , the behavior of std::toupper is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char
Reference: std::toupper

As the standard does not specify if plain char is signed or unsigned[1], the correct usage of std::toupper should be:

#include <algorithm>
#include <cctype>
#include <iostream>
#include <iterator>
#include <string>

void ToUpper(std::string& input)
{
    std::for_each(std::begin(input), std::end(input), [](char& c) {
        c = static_cast<char>(std::toupper(static_cast<unsigned char>(c)));
    });
}

int main()
{
    std::string s{ "Hello world!" };
    std::cout << s << std::endl;
    ::ToUpper(s);
    std::cout << s << std::endl;

    return 0;
}

Output:

Hello world!
HELLO WORLD!
Rosaliarosalie answered 24/1, 2020 at 8:54 Comment(0)
L
3
std::string value;
for (std::string::iterator p = value.begin(); value.end() != p; ++p)
    *p = toupper(*p);
Lifeblood answered 17/12, 2010 at 1:13 Comment(1)
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
S
2
//Since I work on a MAC, and Windows methods mentioned do not work for me, I //just built this quick method.


string str; 
    str = "This String Will Print Out in all CAPS";
    int len = str.size(); 
    char b;

for (int i = 0; i < len; i++){
    b = str[i]; 
    b = toupper(b); 
   // b = to lower(b); //alternately 
     str[i] = b;    
}
    

cout<<str;
Save answered 1/6, 2021 at 15:30 Comment(2)
While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.Pericarditis
So true. Thank you. I will be better at commenting in my code in the future.Save
R
1

try the toupper() function (#include <ctype.h>). it accepts characters as arguments, strings are made up of characters, so you'll have to iterate over each individual character that when put together comprise the string

Reta answered 9/4, 2009 at 17:41 Comment(1)
This suggestion invokes undefined behavior when toupper is called with negative numbers. You should have mentioned the necessary cast to unsigned char.Hawsepipe
D
1

Using Boost.Text, which will work for Unicode text

boost::text::text t = "Hello World";
boost::text::text uppered;
boost::text::to_title(t, std::inserter(uppered, uppered.end()));
std::string newstr = uppered.extract();
Detwiler answered 4/10, 2019 at 8:45 Comment(0)
S
1

Based on Kyle_the_hacker's -----> answer with my extras.

Ubuntu

In terminal List all locales
locale -a

Install all locales
sudo apt-get install -y locales locales-all

Compile main.cpp
$ g++ main.cpp

Run compiled program
$ ./a.out

Results

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

Ubuntu Linux - WSL from VSCODE

Ubuntu Linux - WSL

Windows

In cmd run VCVARS developer tools
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"

Compile main.cpp
> cl /EHa main.cpp /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /std:c++17 /DYNAMICBASE "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MTd

Compilador de optimización de C/C++ de Microsoft (R) versión 19.27.29111 para x64
(C) Microsoft Corporation. Todos los derechos reservados.

main.cpp
Microsoft (R) Incremental Linker Version 14.27.29111.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

Run main.exe
>main.exe

Results

Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
ZOË SALDAÑA PLAYED IN LA MALDICIÓN DEL PADRE CARDONA. ËÈÑ ΑΩ ÓÓCHLOË
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë
zoë saldaña played in la maldición del padre cardona. ëèñ αω óóchloë

Windows

The code - main.cpp

This code was only tested on Windows x64 and Ubuntu Linux x64.

/*
 * Filename: c:\Users\x\Cpp\main.cpp
 * Path: c:\Users\x\Cpp
 * Filename: /home/x/Cpp/main.cpp
 * Path: /home/x/Cpp
 * Created Date: Saturday, October 17th 2020, 10:43:31 pm
 * Author: Joma
 *
 * No Copyright 2020
 */


#include <iostream>
#include <set>
#include <string>
#include <locale>

// WINDOWS
#if (_WIN32)
#include <Windows.h>
#include <conio.h>
#define WINDOWS_PLATFORM 1
#define DLLCALL STDCALL
#define DLLIMPORT _declspec(dllimport)
#define DLLEXPORT _declspec(dllexport)
#define DLLPRIVATE
#define NOMINMAX

//EMSCRIPTEN
#elif defined(__EMSCRIPTEN__)
#include <emscripten/emscripten.h>
#include <emscripten/bind.h>
#include <unistd.h>
#include <termios.h>
#define EMSCRIPTEN_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

// LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
#elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
#define LINUX_PLATFORM 1
#include <unistd.h>
#include <termios.h>
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#define CoTaskMemAlloc(p) malloc(p)
#define CoTaskMemFree(p) free(p)

//ANDROID
#elif (__ANDROID__ || ANDROID)
#define ANDROID_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

//MACOS
#elif defined(__APPLE__)
#include <unistd.h>
#include <termios.h>
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#include "TargetConditionals.h"
#if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
#define IOS_SIMULATOR_PLATFORM 1
#elif TARGET_OS_IPHONE
#define IOS_PLATFORM 1
#elif TARGET_OS_MAC
#define MACOS_PLATFORM 1
#else

#endif

#endif



typedef std::string String;
typedef std::wstring WString;

#define EMPTY_STRING u8""s
#define EMPTY_WSTRING L""s

using namespace std::literals::string_literals;

class Strings
{
public:
    static String WideStringToString(const WString& wstr)
    {
        if (wstr.empty())
        {
            return String();
        }
        size_t pos;
        size_t begin = 0;
        String ret;

#if WINDOWS_PLATFORM
        int size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
        }
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }

    static WString StringToWideString(const String& str)
    {
        if (str.empty())
        {
            return WString();
        }

        size_t pos;
        size_t begin = 0;
        WString ret;
#ifdef WINDOWS_PLATFORM
        int size = 0;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != std::string::npos) {
            std::string segment = std::string(&str[begin], pos - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length()) {
            std::string segment = std::string(&str[begin], str.length() - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
        }

#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != String::npos)
        {
            String segment = String(&str[begin], pos - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length())
        {
            String segment = String(&str[begin], str.length() - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }


    static WString ToUpper(const WString& data)
    {
        WString result = data;
        auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());

        f.toupper(&result[0], &result[0] + result.size());
        return result;
    }

    static String  ToUpper(const String& data)
    {
        return WideStringToString(ToUpper(StringToWideString(data)));
    }

    static WString ToLower(const WString& data)
    {
        WString result = data;
        auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
        f.tolower(&result[0], &result[0] + result.size());
        return result;
    }

    static String ToLower(const String& data)
    {
        return WideStringToString(ToLower(StringToWideString(data)));
    }

};

enum class ConsoleTextStyle
{
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
};

enum class ConsoleForeground
{
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
};

enum class ConsoleBackground
{
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
};

class Console
{
private:
    static void EnableVirtualTermimalProcessing()
    {
#if defined WINDOWS_PLATFORM
        HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
        DWORD dwMode = 0;
        GetConsoleMode(hOut, &dwMode);
        if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
        {
            dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
            SetConsoleMode(hOut, dwMode);
        }
#endif
    }

    static void ResetTerminalFormat()
    {
        std::cout << u8"\033[0m";
    }

    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
    {
        String format = u8"\033[";
        format.append(std::to_string(static_cast<int>(foreground)));
        format.append(u8";");
        format.append(std::to_string(static_cast<int>(background)));
        if (styles.size() > 0)
        {
            for (auto it = styles.begin(); it != styles.end(); ++it)
            {
                format.append(u8";");
                format.append(std::to_string(static_cast<int>(*it)));
            }
        }
        format.append(u8"m");
        std::cout << format;
    }
public:
    static void Clear()
    {

#ifdef WINDOWS_PLATFORM
        std::system(u8"cls");
#elif LINUX_PLATFORM || defined MACOS_PLATFORM
        std::system(u8"clear");
#elif EMSCRIPTEN_PLATFORM
        emscripten::val::global()["console"].call<void>(u8"clear");
#else
        static_assert(false, "Unknown Platform");
#endif
    }

    static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        String str = s;
#ifdef WINDOWS_PLATFORM
        WString unicode = Strings::StringToWideString(str);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << str;
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        WString str = s;

#ifdef WINDOWS_PLATFORM
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << Strings::WideStringToString(str);
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void WriteLine()
    {
        std::cout << std::endl;
    }

    static void Pause()
    {
        char c;
        do
        {
            c = getchar();
            std::cout << "Press Key " << std::endl;
        } while (c != 64);
        std::cout << "KeyPressed" << std::endl;
    }

    static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        int ch;
#ifdef WINDOWS_PLATFORM
        ch = _getch();
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        struct termios oldt, newt;
        tcgetattr(STDIN_FILENO, &oldt);
        newt = oldt;
        newt.c_lflag &= ~(ICANON | ECHO);
        tcsetattr(STDIN_FILENO, TCSANOW, &newt);
        ch = getchar();
        tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
#else
        static_assert(false, "Unknown Platform");
#endif
        if (printWhenPressed)
        {
            Console::Write(String(1, ch), foreground, background, styles);
        }
        return ch;
    }
};



int main()
{
    std::locale::global(std::locale(u8"en_US.UTF-8"));
    String dataStr = u8"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    WString dataWStr = L"Zoë Saldaña played in La maldición del padre Cardona. ëèñ αω óóChloë";
    std::string locale = u8"";
    //std::string locale = u8"de_DE.UTF-8";
    //std::string locale = u8"en_US.UTF-8";
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToUpper(dataStr);
    dataWStr = Strings::ToUpper(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    dataStr = Strings::ToLower(dataStr);
    dataWStr = Strings::ToLower(dataWStr);
    Console::WriteLine(dataStr);
    Console::WriteLine(dataWStr);
    
    
    Console::WriteLine(u8"Press any key to exit"s, ConsoleForeground::DARK_GRAY);
    Console::PauseAny();

    return 0;
}

Sarpedon answered 18/10, 2020 at 6:21 Comment(1)
best laugh I had in a while. Thank you.Ethanol
B
0

not sure there is a built in function. Try this:

Include either the ctype.h OR cctype libraries, as well as the stdlib.h as part of the preprocessor directives.

string StringToUpper(string strToConvert)
{//change each element of the string to upper case
   for(unsigned int i=0;i<strToConvert.length();i++)
   {
      strToConvert[i] = toupper(strToConvert[i]);
   }
   return strToConvert;//return the converted string
}

string StringToLower(string strToConvert)
{//change each element of the string to lower case
   for(unsigned int i=0;i<strToConvert.length();i++)
   {
      strToConvert[i] = tolower(strToConvert[i]);
   }
   return strToConvert;//return the converted string
}
Brittain answered 9/4, 2009 at 17:43 Comment(2)
.length() is not of type 'unsigned int'Hoelscher
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
S
0

Here is the latest code with C++11

std::string cmd = "Hello World";
for_each(cmd.begin(), cmd.end(), [](char& in){ in = ::toupper(in); });
Shakitashako answered 27/12, 2013 at 6:46 Comment(1)
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
P
0

My solution (clearing 6th bit for alpha):

#include <ctype.h>

inline void toupper(char* str)
{
    while (str[i]) {
        if (islower(str[i]))
            str[i] &= ~32; // Clear bit 6 as it is what differs (32) between Upper and Lowercases
        i++;
    }
}
Pallas answered 24/1, 2018 at 2:29 Comment(2)
This code invokes undefined behavior when toupper is called with negative numbers.Hawsepipe
No... Please check you are right before downvoting. Islower would only work on non negative values...Pallas
H
0

If you only want to capitalize, try this function.

#include <iostream>


using namespace std;

string upper(string text){
    string upperCase;
    for(int it : text){
        if(it>96&&it<123){
            upperCase += char(it-32);
        }else{
            upperCase += char(it);
        }
    }
    return upperCase;
}

int main() {
    string text = "^_abcdfghopqrvmwxyz{|}";
    cout<<text<<"/";
    text = upper(text);
    cout<<text;
    return 0;
}

Error: Range-based 'for' loops are not allowed in C++98 mode

Handhold answered 30/11, 2020 at 13:48 Comment(0)
R
-1

Without using any libraries:

std::string YourClass::Uppercase(const std::string & Text)
{
    std::string UppperCaseString;
    UppperCaseString.reserve(Text.size());
    for (std::string::const_iterator it=Text.begin(); it<Text.end(); ++it)
    {
        UppperCaseString.push_back(((0x60 < *it) && (*it < 0x7B)) ? (*it - static_cast<char>(0x20)) : *it);
    }
    return UppperCaseString;
}
Ranjiv answered 18/8, 2013 at 17:23 Comment(1)
The above code works only for ASCII-compatible encodings. Neither the question not your answer mentions this restriction. One of them should.Hawsepipe
W
-1

If you are only concerned with 8 bit characters (which all other answers except Milan Babuškov assume as well) you can get the fastest speed by generating a look-up table at compile time using metaprogramming. On ideone.com this runs 7x faster than the library function and 3x faster than a hand written version (http://ideone.com/sb1Rup). It is also customizeable through traits with no slow down.

template<int ...Is>
struct IntVector{
using Type = IntVector<Is...>;
};

template<typename T_Vector, int I_New>
struct PushFront;
template<int ...Is, int I_New>
struct PushFront<IntVector<Is...>,I_New> : IntVector<I_New,Is...>{};

template<int I_Size, typename T_Vector = IntVector<>>
struct Iota : Iota< I_Size-1, typename PushFront<T_Vector,I_Size-1>::Type> {};
template<typename T_Vector>
struct Iota<0,T_Vector> : T_Vector{};

template<char C_In>
struct ToUpperTraits {
    enum { value = (C_In >= 'a' && C_In <='z') ? C_In - ('a'-'A'):C_In };
};

template<typename T>
struct TableToUpper;
template<int ...Is>
struct TableToUpper<IntVector<Is...>>{
    static char at(const char in){
        static const char table[] = {ToUpperTraits<Is>::value...};
        return table[in];
    }
};

int tableToUpper(const char c){
    using Table = TableToUpper<typename Iota<256>::Type>;
    return Table::at(c);
}

with use case:

std::transform(in.begin(),in.end(),out.begin(),tableToUpper);

For an in depth (many page) decription of how it works allow me to shamelessly plug my blog: http://metaporky.blogspot.de/2014/07/part-4-generating-look-up-tables-at.html

Wrote answered 31/7, 2014 at 13:26 Comment(0)
C
-1
template<size_t size>
char* toupper(char (&dst)[size], const char* src) {
    // generate mapping table once
    static char maptable[256];
    static bool mapped;
    if (!mapped) {
        for (char c = 0; c < 256; c++) {
            if (c >= 'a' && c <= 'z')
                maptable[c] = c & 0xdf;
            else
                maptable[c] = c;
        }
        mapped = true;
    }

    // use mapping table to quickly transform text
    for (int i = 0; *src && i < size; i++) {
        dst[i] = maptable[*(src++)];
    }
    return dst;
}
Cumbersome answered 16/4, 2015 at 8:51 Comment(0)
L
-1

This c++ function always returns the upper case string...

#include <locale> 
#include <string>
using namespace std; 
string toUpper (string str){
    locale loc; 
    string n; 
    for (string::size_type i=0; i<str.length(); ++i)
        n += toupper(str[i], loc);
    return n;
}
Laina answered 24/7, 2018 at 17:30 Comment(0)
I
-2

ALL of these solutions on this page are harder than they need to be.

Do this

RegName = "SomE StRing That you wAnt ConvErTed";
NameLength = RegName.Size();
for (int forLoop = 0; forLoop < NameLength; ++forLoop)
{
     RegName[forLoop] = tolower(RegName[forLoop]);
}

RegName is your string. Get your string size don't use string.size() as your actual tester, very messy and can cause issues. then. the most basic for loop.

remember string size returns the delimiter too so use < and not <= in your loop test.

output will be: some string that you want converted

Insecure answered 14/2, 2012 at 21:37 Comment(2)
I don't see how this is simpler than the boost::toupper solution. Can you elaborate?Knowhow
There are already lots of simple tolower loops, and most of them use standard loop variable names like i, not the weird forLoop.Abut
N
-5

I use this solution. I know you're not supposed to modify that data area.... but I think that's mostly for buffer overrun bugs and null character.... upper casing things isn't the same.

void to_upper(const std::string str) {
    std::string::iterator it;
    int i;
    for ( i=0;i<str.size();++i ) {
        ((char *)(void *)str.data())[i]=toupper(((char *)str.data())[i]);
    }
}
Novah answered 2/5, 2012 at 20:25 Comment(3)
I know you're not supposed to modify that data area - what data area are you not supposed to modify?Aintab
This is late, but what on earth? That crazy line can be replaced with str[i] = toupper(str[i]); perfectly fine (well, not perfectly fine, but it fixes most of the things wrong).Slogan
Let's see. You: 1. Define a void function that takes its argument by value (instead of by reference), making all changes to the string inaccessible outside the function, so that the function is completely useless. 2. Declare an iterator variable that you never use. 3. Cast str.data() to a void* for no good reason. This technically makes it undefined behavior, violating the string aliasing rule.Libb

© 2022 - 2024 — McMap. All rights reserved.