C string to uppercase in C and C++
Asked Answered
V

4

21

While I was putting together a to-uppercase function in C++ I noticed that I did not receive the expected output in C.

C++ function

#include <iostream>
#include <cctype>
#include <cstdio>

void strupp(char* beg)
{
    while (*beg++ = std::toupper(*beg));
}

int main(int charc, char* argv[])
{
    char a[] = "foobar";
    strupp(a);
    printf("%s\n", a);
    return 0;
}

Output as expected:

FOOBAR


C function
#include <ctype.h>
#include <stdio.h>
#include <string.h>

void strupp(char* beg)
{
    while (*beg++ = toupper(*beg));
}

int main(int charc, char* argv[])
{
    char a[] = "foobar";
    strupp(a);
    printf("%s\n", a);
    return 0;
}

The output is the expected result with the first character missing

OOBAR

Does anyone know why the result gets truncated while compiling in C?

Volant answered 12/10, 2015 at 16:33 Comment(10)
And if you really wanted to do this in C++: std::transform(a, a + strlen(a), a, std::toupper);Resupinate
Can you explain why you expected this to convert a string to uppercase? Specifically, why did you expect the right side of the = to be evaluated before the left?Rainbolt
I'm thankful to all people who gave feedback and for the valuable info providedVolant
Also add cast: while (*beg = toupper((unsigned char) *beg)) beg++;. toupper() is not well defined for negative char values.Gutbucket
Don't write so complex code if you don't understand sequence points and operation priorities. And even if you understand them - care about your colleagues, write each statement on a new line.Helmick
@Helmick I strongly believe that learning from your mistakes is the most efficient way not to make the same mistake again. Hence as this problem stemmed solely from experimentation and not in a team project environment I don't think it was a mistake to write this code snippet to begin withVolant
@PaulMcKenzie, The docs for std::transform state that "std::transform does not guarantee in-order application of unary_op or binary_op. To apply a function to a sequence in-order or to apply a function that modifies the elements of a sequence, use std::for_each".Warsle
So then something like std::for_each(a, a + strlen(a), [](char &c){ c = std::toupper(c); });?Warsle
@MillieSmith In this case, does it matter the order of which item will be made upper case? Anyway, most, if not all of the classical C++ approaches of mutating a string use std::transform.Resupinate
@AlexKoukoulas i think so about learning from mistakes. What about trying doing experiments with different compilers? It could be interesting :)Helmick
O
30

The problem is that there is no sequence point in

while (*beg++ = toupper(*beg));

So we have undefined behavior. What the compiler is doing in this case is evaluating beg++ before toupper(*beg) In C where in C++ it is doing it the other way.

Obryan answered 12/10, 2015 at 16:37 Comment(4)
Does this mean that the classic c-string copying one-liner while(*s++ = *t++) ; has undefined behavior?Insurgent
@markh No as that is two different variables. This is synonymous with while (*s++ = *s++)Obryan
@SteveJessop Actually, while(*s++ = *t++); has defined behaviour even if s == t. The reason strcpy is undefined with overlapping regions is that strcpy isn't necessarily implemented with while(*s++ = *t++);.Stepup
@immibis: fair point, agreed it's defined. However, even if strcpy is implemented with that loop it still has undefined behavior for one "direction" of overlap. If the destination is greater than the source and they overlap, then the terminating nul gets overwritten before it's read, and eventually the buffer is overrun! The reason overlap is forbidden in both "directions" is for possible other implementations.Delicacy
A
15
while (*beg++ = std::toupper(*beg));

leads to undefined behavior.

Whether *beg++ is sequenced before or after std::toupper(*beg) is unspecified.

The simple fix is to use:

while (*beg = std::toupper(*beg))
   ++beg;
Alic answered 12/10, 2015 at 16:37 Comment(0)
W
10

The line

while (*beg++ = toupper(*beg));

contains a side effect on an entity that's being used twice. You can not know, whether or not beg++ is executed before or after the *beg (inside the toupper). You're just lucky that both implementations show both behaviors, as I'm pretty sure it's the same for C++. (However, there were some rule changes for c++11, which I'm not certain of - still, it's bad style.)

Just move the beg++ out of the condition:

while (*beg = toupper(*beg)) beg++;
Warfare answered 12/10, 2015 at 16:40 Comment(0)
D
3

with respect to above answer, the 'f' is never passed inside function, you should try using this:

     while ((*beg = (char) toupper(*beg))) beg++;
Detect answered 12/10, 2015 at 16:41 Comment(2)
casting to char was important as value of type int may not fit into receiver type charDetect
That cast achieves nothing that wouldn't happen without it anyway.Chill

© 2022 - 2024 — McMap. All rights reserved.