How does one safely static_cast between unsigned int and int?

D

6

16

I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:

int hex_str_to_int(const string hexStr)
{    
    stringstream strm;
    strm << hex << hexStr;
    unsigned int val = 0;
    strm >> val;
    return static_cast<int>(val);
}

doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.

Here's what I do know:

I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.

This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?

Update:

Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?

Dearly answered 29/9, 2011 at 18:25 Comment(5)

Can't just use (int)val? However, "Changing the type of val to int also results in a 0...." means issue might be from >>? (I have no idea really, I don't use C++ ;-) – Kellerman 29/9, 2011 at 18:27

Signed integer overflow isn't implementation-defined, its undefined. – Bricklaying 29/9, 2011 at 18:29

@derobert, thanks, I wasn't sure. I knew it wasn't good. Updated the question accordingly. – Dearly 29/9, 2011 at 18:32

Converting from unsigned to signed is implementation defined if the unsigned number is not in the range of the signed type. – Hards 29/9, 2011 at 18:53

@Bricklaying : This isn't signed overflow though, this is integral conversion, the result of which is implementation-defined. – Virtu 29/9, 2011 at 18:56

C

13

While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:

int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;

Coley answered 29/9, 2011 at 18:52 Comment(9)

Implementation-defined behavior, not undefined. – Virtu 29/9, 2011 at 18:57

@ildjarn: One widely used approach is return *(int*)(&val); This isn't implementation-defined behavior. It is undefined behavior. – Coley 29/9, 2011 at 19:1

Ah, that's equivalent to a reinterpret_cast, which is indeed UB; I assumed you were referring to the static_cast in the OP's question, whose behavior is implementation-defined. – Virtu 29/9, 2011 at 19:3

Why is it undefined behavior? – Sarthe 18/7, 2017 at 23:57

This is surprisingly fast with modern compilers. – Colbycolbye 11/2, 2020 at 1:6

C++20 will have bit_cast which effectively does the same thing: en.cppreference.com/w/cpp/numeric/bit_cast – Colbycolbye 11/2, 2020 at 3:17

@EmileCormier - It's fairly fast even at the lowest optimization level as the call to memcpy does not occur; I tested with multiple compilers. At anything but the lowest optimization level it's extremely fast because the working variable (signed_val in my answer) gets optimized away. I suspect these kinds of optimizations were in place well before I wrote the above answer 9+ years ago. The as-if rule was certainly in place even with the original version of the standard. That is the rule that enables the elimination of the call to memcpy. – Coley 11/2, 2020 at 7:21

@Sarthe - It's undefined because the standard very explicitly says so. Doing so violates C++'s strict aliasing rule, which is even stricter than C's strict aliasing rule. The C-style cast return *(int*)(&val) is undefined behavior even in C. – Coley 11/2, 2020 at 7:38

@DavidHammen It's only recently I've encountered a similar problem to the OP and discovered that compilers can optimize away memcpy. Thanks for pointing out that compilers knew that trick long ago. – Colbycolbye 11/2, 2020 at 17:40

V

16

Quoting the C++03 standard, §4.7/3 (Integral Conversions):

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.

Virtu answered 29/9, 2011 at 18:50 Comment(0)

C

13

While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:

int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;

Coley answered 29/9, 2011 at 18:52 Comment(9)

Implementation-defined behavior, not undefined. – Virtu 29/9, 2011 at 18:57

@ildjarn: One widely used approach is return *(int*)(&val); This isn't implementation-defined behavior. It is undefined behavior. – Coley 29/9, 2011 at 19:1

Ah, that's equivalent to a reinterpret_cast, which is indeed UB; I assumed you were referring to the static_cast in the OP's question, whose behavior is implementation-defined. – Virtu 29/9, 2011 at 19:3

Why is it undefined behavior? – Sarthe 18/7, 2017 at 23:57

This is surprisingly fast with modern compilers. – Colbycolbye 11/2, 2020 at 1:6

C++20 will have bit_cast which effectively does the same thing: en.cppreference.com/w/cpp/numeric/bit_cast – Colbycolbye 11/2, 2020 at 3:17

@EmileCormier - It's fairly fast even at the lowest optimization level as the call to memcpy does not occur; I tested with multiple compilers. At anything but the lowest optimization level it's extremely fast because the working variable (signed_val in my answer) gets optimized away. I suspect these kinds of optimizations were in place well before I wrote the above answer 9+ years ago. The as-if rule was certainly in place even with the original version of the standard. That is the rule that enables the elimination of the call to memcpy. – Coley 11/2, 2020 at 7:21

@Sarthe - It's undefined because the standard very explicitly says so. Doing so violates C++'s strict aliasing rule, which is even stricter than C's strict aliasing rule. The C-style cast return *(int*)(&val) is undefined behavior even in C. – Coley 11/2, 2020 at 7:38

@DavidHammen It's only recently I've encountered a similar problem to the OP and discovered that compilers can optimize away memcpy. Thanks for pointing out that compilers knew that trick long ago. – Colbycolbye 11/2, 2020 at 17:40

S

5

You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:

if (val < 0x80000000) // positive values need no conversion
  return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
  return -0x80000000; // aka INT_MIN
else
  return -(int)(~val + 1);

This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).

Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.

Salvo answered 29/9, 2011 at 18:30 Comment(2)

Yeah I'm pretty sure this code is likely to run in a 64-bit environment someday. Hard-coding bit patterns like that is probably not a good idea. This solution works on this machine though. – Dearly 29/9, 2011 at 18:47

Most 64-bit environments use 32-bit ints. In any case, though, you can use ~(~(unsigned yourinttype)0 >> 1) to find the right value for other unsigned integer types (eg, unsigned long long) – Salvo 29/9, 2011 at 18:53

D

5

Here's another solution that worked for me:

if (val <= INT_MAX) {
    return static_cast<int>(val);
}
else {
    int ret = static_cast<int>(val & ~INT_MIN);
    return ret | INT_MIN;
}

If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.

Dearly answered 29/9, 2011 at 18:55 Comment(0)

C

5

C++20 will have std::bit_cast that copies bits verbatim:

#include <bit>
#include <cassert>
#include <iostream>

int main()
{
    int i = -42;
    auto u = std::bit_cast<unsigned>(i);
    // Prints 4294967254 on two's compliment platforms where int is 32 bits
    std::cout << u << "\n";

    auto roundtripped = std::bit_cast<int>(u);
    assert(roundtripped == i);
    std::cout << roundtripped << "\n"; // Prints -42

    return 0;
}

cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).

While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.

Colbycolbye answered 11/2, 2020 at 17:58 Comment(1)

It's worth noting that the memcpy approach matches this answer https://mcmap.net/q/717839/-how-does-one-safely-static_cast-between-unsigned-int-and-int – Dearly 12/2, 2020 at 21:8

V

-2

unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1

Сontrariwise:

int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones

Valve answered 2/7, 2019 at 9:48 Comment(0)

Recommended topics

Hot tags