Bit wise '&' with signed vs unsigned operand

Asked 3/8, 2016 at 7:25 Answered 3/8, 2016 at 8:18

Solved c++c++11 bitwise-and type-promotion sign-extension

I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.

Here is the minimal code:

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0x8123U;

    uint64_t new_check = (check & 0xFFFF) << 16;

    std::cout << std::hex << new_check << std::endl;

    new_check = (check & 0xFFFFU) << 16;

    std::cout << std::hex << new_check << std::endl;

    return 0;
}

I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example

The output was:

ffffffff81230000

81230000

I can't really understand the reason for the output in the first case.

Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?

I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.

But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?

Flue answered 3/8, 2016 at 7:25 Comment(9)

It doesn't seem possible. The fffff... result is typical for sign extension, which would occur for int16_t. But not for uint16_t. Are you sure this is the actual code used to produce the results? – Berriman 3/8, 2016 at 7:34

Unable to reproduce with 2 Windows compilers, using 0xFFFFll (to ensure 64-bit) for the first mask. – Berriman 3/8, 2016 at 7:37

@Cheersandhth.-Alf : Which is indeed expected. Smaller types are promoted to larger types, if you start with the largest type you avoid such promotion. – Clodhopping 3/8, 2016 at 7:38

@Cheersandhth.-Alf: I just reproduced in Visual Studio 2015 on x86. check & 0xFFFF returns 0x00008123, (check & 0xFFFF) << 16 returns 0x81230000 in the immediate window, while (uint64_t)((check & 0xFFFF) << 16) returns 0xffffffff81230000. – Graniteware 3/8, 2016 at 7:39

@AlexLop. : Be careful with your terms. signed in fact names a type, it's shorthand for signed int aka just int. And that is the default type for integral constants such as 0xFFFF. – Clodhopping 3/8, 2016 at 7:40

@Groo: Yes, thanks, I mistakenly tried to emulate 64-bit int. But what happens here is that the source has been compiled with 32-bit int. Where MSB is 1, so, sign extension. – Berriman 3/8, 2016 at 7:43

@Clodhopping you are right... I should have written "signed type" – Flue 3/8, 2016 at 7:43

Unless I'm sleep-deprived, the expression (check & 0xFFFF) is type int. The subsequent << 16 is shifting into the sign bit of the that temporary, which is then converted to uint64_t – Hildehildebrand 3/8, 2016 at 7:46

For extra fun, you get the signed behaviour if you cast 0xFFFFu to uint16_t. – Clerihew 3/8, 2016 at 9:5

That's indeed an interesting corner case. It only occurs here because you use uint16_t for the unsigned type when you architecture use 32 bits for ìnt

Here is a extract from Clause 5 Expressions from draft n4296 for C++14 (emphasize mine):

10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions ... This pattern is called the usual arithmetic conversions, which are defined as follows:
...
(10.5.3) — Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
(10.5.4) — Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.

You are in the 10.5.4 case:

uint16_t is only 16 bits while int is 32
int can represent all the values of uint16_t

So the uint16_t check = 0x8123U operand is converted to the signed 0x8123 and result of the bitwise & is still 0x8123.

But the shift (bitwise so it happens at the representation level) causes the result to be the intermediate unsigned 0x81230000 which converted to an int gives a negative value (technically it is implementation defined, but this conversion is a common usage)

5.8 Shift operators [expr.shift]
...
Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value;...

and

4.7 Integral conversions [conv.integral]
...
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.

(beware this was true undefined behaviour in C++11...)

So you end with a conversion of the signed int 0x81230000 to an uint64_t which as expected gives 0xFFFFFFFF81230000, because

4.7 Integral conversions [conv.integral]
...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).

TL/DR: There is no undefined behaviour here, what causes the result is the conversion of signed 32 bits int to unsigned 64 bits int. The only part part that is undefined behaviour is a shift that would cause a sign overflow but all common implementations share this one and it is implementation defined in C++14 standard.

Of course, if you force the second operand to be unsigned everything is unsigned and you get evidently the correct 0x81230000 result.

[EDIT] As explained by MSalters, the result of the shift is only implementation defined since C++14, but was indeed undefined behaviour in C++11. The shift operator paragraph said:

...
Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Euripides answered 3/8, 2016 at 8:18 Comment(2)

Note that the example code is compiled as -std=c++0x, i.e. C++11-draft, not C++14. – Clodhopping 3/8, 2016 at 8:39

+1 Interesting. It is undefined in C11 just as in C++11. Another example where the languages now differ. – Outlaw 3/8, 2016 at 9:34

Let's take a look at

uint64_t new_check = (check & 0xFFFF) << 16;

Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.

In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.

In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.

If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:

#include <iostream>
#include <cstdint>

template<typename T>
void typecheck(T const& t)
{
    std::cout << __PRETTY_FUNCTION__ << '\n';
    std::cout << t << '\n';
}
int main()
{
    uint16_t check = 0x8123U;

    typecheck(0xFFFF);
    typecheck(check & 0xFFFF);
    typecheck((check & 0xFFFF) << 16);

    typecheck(0xFFFFU);
    typecheck(check & 0xFFFFU);
    typecheck((check & 0xFFFFU) << 16);

    return 0;
}

Output

void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624

Habitat answered 3/8, 2016 at 7:36 Comment(4)

No, the MSB is not 1 for the signed result. – Berriman 3/8, 2016 at 7:38

Sorry. MSB is indeed 1 for a 32-bit signed result. I.e. where int is 32-bit. – Berriman 3/8, 2016 at 7:42

@Cheersandhth.-Alf ... OK but isn't (uint64_t)0x80000000 == 0x0000000080000000ULL – Flue 3/8, 2016 at 7:50

@AlexLop.: With 32-bit int the type of 0x80000000 is unsigned int. ;-) – Berriman 3/8, 2016 at 7:53

The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.

Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?

As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.

Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.

Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.

Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.

[edit] Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.

Clodhopping answered 3/8, 2016 at 7:55 Comment(1)

My reading of the standard is that there is no Undefined Behaviour here, but only an implementation dependant case (see my answer). So purple should not be an acceptable value ;-) – Euripides 3/8, 2016 at 8:22

0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:

#include <stdint.h>
#include <type_traits>

uint64_t foo(uint16_t a) {
  auto x = (a & 0xFFFF);
  static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
  static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
  return x;
}

http://ideone.com/tEQmbP

Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.

Saving answered 3/8, 2016 at 7:51 Comment(0)

This is the result of integer promotion. Before the & operation happens, if the operands are "smaller" than an int (for that architecture), compiler will promote both operands to int, because they both fit into a signed int:

This means that the first expression will be equivalent to (on a 32-bit architecture):

// check is uint16_t, but it fits into int32_t.
// the constant is signed, so it's sign-extended into an int
((int32_t)check & (int32_t)0xFFFFFFFF)

while the other one will have the second operand promoted to:

// check is uint16_t, but it fits into int32_t.
// the constant is unsigned, so the upper 16 bits are zero
((int32_t)check & (int32_t)0x0000FFFFU)

If you explicitly cast check to an unsigned int, then the result will be the same in both cases (unsigned * signed will result in unsigned):

((uint32_t)check & 0xFFFF) << 16

will be equal to:

((uint32_t)check & 0xFFFFU) << 16

Graniteware answered 3/8, 2016 at 7:29 Comment(2)

But uint16_t is unsigned... unsigned X signed should still result in unsigned... shouldn't it? – Flue 3/8, 2016 at 7:30

@AlexLop.: uint16_t is also promoted to signed int because it fits in a signed int. – Graniteware 3/8, 2016 at 7:54

Your platform has 32-bit int.

Your code is exactly equivalent to

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0x8123U;
    auto a1 = (check & 0xFFFF) << 16
    uint64_t new_check = a1;
    std::cout << std::hex << new_check << std::endl;

    auto a2 = (check & 0xFFFFU) << 16;
    new_check = a2;
    std::cout << std::hex << new_check << std::endl;
    return 0;
}

What's the type of a1 and a2?

For a2, the result is promoted to unsigned int.
More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.

Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:

#include <iostream>
#include <cstdint>

int main()
{
    uint16_t check = 0;
    std::cout << check
              << "  " << (int)(check + 0x80000000)
              << "  " << (uint64_t)(int)(check + 0x80000000) << std::endl;
    return 0;
}

On my system (also 32-bit int), I get

0  -2147483648  18446744071562067968

showing where the promotion and sign-extension happens.

Detonation answered 3/8, 2016 at 8:16 Comment(1)

If you're saying "exactly equivalent to" then perhaps auto a1 = (check & 0x0000FFFF) might be appropriate and clarifying. – Saving 3/8, 2016 at 8:38

The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.

Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!

Pericardium answered 3/8, 2016 at 7:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Output

Recommended topics

Hot tags