Is masking before unsigned left shift in C/C++ too paranoid?
Asked Answered
Z

5

76

This question is motivated by me implementing cryptographic algorithms (e.g. SHA-1) in C/C++, writing portable platform-agnostic code, and thoroughly avoiding undefined behavior.

Suppose that a standardized crypto algorithm asks you to implement this:

b = (a << 31) & 0xFFFFFFFF

where a and b are unsigned 32-bit integers. Notice that in the result, we discard any bits above the least significant 32 bits.


As a first naive approximation, we might assume that int is 32 bits wide on most platforms, so we would write:

unsigned int a = (...);
unsigned int b = a << 31;

We know this code won't work everywhere because int is 16 bits wide on some systems, 64 bits on others, and possibly even 36 bits. But using stdint.h, we can improve this code with the uint32_t type:

uint32_t a = (...);
uint32_t b = a << 31;

So we are done, right? That's what I thought for years. ... Not quite. Suppose that on a certain platform, we have:

// stdint.h
typedef unsigned short uint32_t;

The rule for performing arithmetic operations in C/C++ is that if the type (such as short) is narrower than int, then it gets widened to int if all values can fit, or unsigned int otherwise.

Let's say that the compiler defines short as 32 bits (signed) and int as 48 bits (signed). Then these lines of code:

uint32_t a = (...);
uint32_t b = a << 31;

will effectively mean:

unsigned short a = (...);
unsigned short b = (unsigned short)((int)a << 31);

Note that a is promoted to int because all of ushort (i.e. uint32) fits into int (i.e. int48).

But now we have a problem: shifting non-zero bits left into the sign bit of a signed integer type is undefined behavior. This problem happened because our uint32 was promoted to int48 - instead of being promoted to uint48 (where left-shifting would be okay).


Here are my questions:

  1. Is my reasoning correct, and is this a legitimate problem in theory?

  2. Is this problem safe to ignore because on every platform the next integer type is double the width?

  3. Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)

Clarifications/edits:

  • I'll accept answers for C or C++ or both. I want to know the answer for at least one of the languages.

  • The pre-masking logic may hurt bit rotation. For example, GCC will compile b = (a << 31) | (a >> 1); to a 32-bit bit-rotation instruction in assembly language. But if we pre-mask the left shift, it is possible that the new logic is not translated into bit rotation, which means now 4 operations are performed instead of 1.

Zoophilia answered 10/10, 2016 at 18:36 Comment(25)
Unclear what you're asking. Also one question per question please.Poppycock
Agree with @πάνταῥεῖ . Also C and C++ are different questions. Pick one of them and never compile C as C++ or vice versa.Colewort
"Is masking before left shift " and (a << 31) & 0xFFFFFFFF does not jibe. The code does a mask after the shift.Explication
re: C/C++: can you choose one or the other? If you want the answer to both, then post two questions. You can't know for sure that the answer is the same for both languages.Pekingese
@chux Consider (a << 31) & 0xFFFFFFFF as the theoretical infinite-width int code in PythonZoophilia
Je ne parle pas Python.Explication
Whatever you do, verify your assumptions of the used integer type by static asserting std::numeric_limits<T>::digits.Canticle
I think this is a realistic problem. A better example would be a 16 bit version instead of 32. My answer is: you can use 31u then a will be promoted to uint48.Seethe
@Jarod42: Where do I miss the point?Colewort
Better to mask before (as the question asks) and use uint32_t a; uint32_t b = (a & 1) << 31; --> no UB nor ID.Explication
@chux: If we are at this, use 1U, not 1.Colewort
@Olaf: You remove/edit the comment assuming that OP used int for 32 bits.Hydrocortisone
@Olaf uint32_t a; uint32_t b = (a & 1) << 31; and uint32_t a; uint32_t b = (a & 1u) << 31; work the same.Explication
@chux: Yes, but it is always good to be clear one wants to use unsigned.Colewort
@Jarod42: So now guess why I removed it ... (I removed it before you commented). Just let's clear this up and remove those context-less commentsColewort
@Nayuki: You may use some typedef as using my_uint_at_least32 = std::conditional_t<(sizeof(std::uint32_t) < sizeof(unsigned)), unsigned, std::uint32_t>;.Hydrocortisone
@Hydrocortisone That is a scary-looking construct, but at a glance it appears to work with no caveats. Please write this up as an answer for C++!Zoophilia
@Nayuki: Would you please at least remove one of the language tags then? Otherwise the answer is not correct.Colewort
You mean masking after a shift right? Masking before (x & 0xffffffff) << 31 makes no sense at all.Pensile
@Pensile In infinite-width arithmetic (e.g. Python), we can mask before or after the left shift. In finite-width arithmetic (e.g. C/C++), we can either mask before the shift or let the bits fall off the left end implicitly. Is this clear?Zoophilia
@Nayuki: But in your code example you are masking after the shift while in your question title you say before. Surely you mean masking after the shift?Pensile
Can't typedef unsigned short uint32_t be dismissed as their own fault if it doesn't work?Leupold
Nayuki, any open issues left on this question - how can I help?Explication
@chux-ReinstateMonica No, I resolved all open questions years ago. I'm satisfied with my understanding of the topics discussed in this thread.Zoophilia
Nayuki, My error, I mistakenly thought there was no accepted answer and was seeking to help resolve it.Explication
Z
15

Taking a clue from this question about possible UB in uint32 * uint32 arithmetic, the following simple approach should work in C and C++:

uint32_t a = (...);
uint32_t b = (uint32_t)((a + 0u) << 31);

The integer constant 0u has type unsigned int. This promotes the addition a + 0u to uint32_t or unsigned int, whichever is wider. Because the type has rank int or higher, no more promotion occurs, and the shift can be applied with the left operand being uint32_t or unsigned int.

The final cast back to uint32_t will just suppress potential warnings about a narrowing conversion (say if int is 64 bits).

A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.

Zoophilia answered 11/10, 2016 at 2:21 Comment(0)
S
24

Speaking to the C side of the problem,

  1. Is my reasoning correct, and is this a legitimate problem in theory?

It is a problem that I had not considered before, but I agree with your analysis. C defines the behavior of the << operator in terms of the type of the promoted left operand, and it it conceivable that the integer promotions result in that being (signed) int when the original type of that operand is uint32_t. I don't expect to see that in practice on any modern machine, but I'm all for programming to the actual standard as opposed to my personal expectations.

  1. Is this problem safe to ignore because on every platform the next integer type is double the width?

C does not require such a relationship between integer types, though it is ubiquitous in practice. If you are determined to rely only on the standard, however -- that is, if you are taking pains to write strictly conforming code -- then you cannot rely on such a relationship.

  1. Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?: b = (a & 1) << 31;. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)

The type unsigned long is guaranteed to have at least 32 value bits, and it is not subject to promotion to any other type under the integer promotions. On many common platforms it has exactly the same representation as uint32_t, and may even be the same type. Thus, I would be inclined to write the expression like this:

uint32_t a = (...);
uint32_t b = (unsigned long) a << 31;

Or if you need a only as an intermediate value in the computation of b, then declare it as an unsigned long to begin with.

Sermonize answered 10/10, 2016 at 19:13 Comment(5)
Well argued, John! I have one slightly concern though - so long is at least 32 bits. But on many systems nowadays, it will be exactly 64 bits. Would this make the code unnecessarily slow due to the widened arithmetic?Zoophilia
@Nayuki, I have interpreted the question to be about writing code that is universally correct. When one desires to tune code for performance on specific hardware, one generally needs to write code that accounts for the specific characteristics of that hardware. To the extent that such code incorporates assumptions about the implementation -- which is the point -- that code does not strictly conform to the standard. It might be suboptimal on systems other than the intended one, even to the extent of exhibiting UB.Sermonize
@Zoophilia I would handle making that sort of adaptation in the deployment code e.g. ./configure script or makefile, and only if you've found signs it's too slow on the target system(s). ( What counts as "too slow" is up to you, though :) )Colfin
I'd say it's safe to assume any decent compiler on any platform is able to optimize out the masking, realizing that on the particular platform it's on, it's a no-op. Mask on, I say!Feat
@Nayuki: Compilers are really good with arithmetic. When the compiler detects that only the lower 32 bits of your u64 are required, then it's got no reason not to use a 32 bits register for the shift. So, write correct code first, check the generated assembly second.Subtrahend
E
22

Q1: Masking before the shift does prevent undefined behavior that OP has concern.

Q2: "... because on every platform the next integer type is double the width?" --> no. The "next" integer type could be less than 2x or even the same size.

The following is well defined for all compliant C compilers that have uint32_t.

uint32_t a; 
uint32_t b = (a & 1) << 31;

Q3: uint32_t a; uint32_t b = (a & 1) << 31; is not expected to incur code that performs a mask - it is not needed in the executable - just in the source. If a mask does occur, get a better compiler should speed be an issue.

As suggested, better to emphasize the unsigned-ness with these shifts.

uint32_t b = (a & 1U) << 31;

@John Bollinger good answer well details how to handle OP's specific problem.

The general problem is how to form a number that is of at least n bits, a certain sign-ness and not subject to surprising integer promotions - the core of OP's dilemma. The below fulfills this by invoking an unsigned operation that does not change the value - effective a no-op other than type concerns. The product will be at least the width of unsigned or uint32_t. Casting, in general, may narrow the type. Casting needs to be avoided unless narrowing is certain to not occur. An optimization compiler will not create unnecessary code.

uint32_t a;
uint32_t b = (a + 0u) << 31;
uint32_t b = (a*1u) << 31;
Explication answered 10/10, 2016 at 18:59 Comment(2)
I would be tempted to wrap this in a macro with a comment explaining it. Otherwise you are just asking for the next developer who comes along to delete the "no-op".Kagu
@Kagu Perhaps something like #define PROMOTE_AT_LEAST_UNSIGNED(x) ((x) + 0u) or something less verbose like PROMOTE_UNSIGNED ?Explication
Z
15

Taking a clue from this question about possible UB in uint32 * uint32 arithmetic, the following simple approach should work in C and C++:

uint32_t a = (...);
uint32_t b = (uint32_t)((a + 0u) << 31);

The integer constant 0u has type unsigned int. This promotes the addition a + 0u to uint32_t or unsigned int, whichever is wider. Because the type has rank int or higher, no more promotion occurs, and the shift can be applied with the left operand being uint32_t or unsigned int.

The final cast back to uint32_t will just suppress potential warnings about a narrowing conversion (say if int is 64 bits).

A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.

Zoophilia answered 11/10, 2016 at 2:21 Comment(0)
H
10

To avoid unwanted promotion, you may use the greater type with some typedef, as

using my_uint_at_least32 = std::conditional_t<(sizeof(std::uint32_t) < sizeof(unsigned)),
                                              unsigned,
                                              std::uint32_t>;
Hydrocortisone answered 10/10, 2016 at 18:55 Comment(2)
Note: To make it clear, this answer is for C++ only.Zoophilia
A C idea to complement this C++ code: #if UINT32_MAX > UINT_MAX && UINT_MAX != -1 typedef uint32_t my_uint_at_least32; #else typedef unsigned my_uint_at_least32; #endif.Explication
S
-1

For this segment of code:

uint32_t a = (...);
uint32_t b = a << 31;

To promote a to a unsigned type instead of signed type, use:

uint32_t b = a << 31u;

When both sides of << operator is an unsigned type, then this line in 6.3.1.8 (C standard draft n1570) applies:

Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.


The problem you are describing is caused you use 31 which is signed int type so another line in 6.3.1.8

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

forces a to promoted to a signed type


Update:

This answer is not correct because 6.3.1.1(2) (emphasis mine):

...

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.

and footnote 58 (emphasis mine):

58) The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.

Since only integer promotion is happening and not common arithmetic conversion, using 31u does not guarantee a to be converted to unsigned int as stated above.

Seethe answered 10/10, 2016 at 18:55 Comment(9)
Shift << using different rules.Explication
"To promote a to a unsigned type instead of signed type, use: uint32_t b = a << 31u;" does not work "The type of the result is that of the promoted left operand." C11 §6.5.7 3Explication
I read the C99 standard and it indirectly seems to support user3528438's answer. Note that integer literals are at least rank int, never lower. Namely, ushort << uint gets converted to the common type uint << uint, which ensures that the computation is performed as an unsigned left shift.Zoophilia
@Zoophilia "Otherwise, if both operands have signed integer types..." is under "Usual arithmetic conversions" which has "Unless explicitly stated otherwise,..." which negates the first half of this answer due to C11 §6.5.7 3 . The problem this post is having is not due to "caused you use 31 which is signed int type". Further the CC++ nature of this post complicates the unusual integer promotions concept.Explication
The operands of the left shift operator are not among those that are subject to the usual arithmetic conversions. Instead, the standard specifies that they are subject to integer promotion, which is only part of the standard arithmetic conversions, and in particular, does not include the guarantee that the type of a promoted unsigned operand is itself unsigned.Sermonize
@chux I check N1570 again and did not see it as a problem. 31u promotes a and the result to unsigned int, the result follows this conversion and is also unsigned int. Then the assignment follows 6.3.1.3(2). I don't see UB happening during this process.Seethe
@user3528438, the OP postulates a system on which int is 48 bits wide. In the course of evaluating the expression a << 31u on such a system, the integer promotions are applied to both operands, resulting in the left operand being promoted to (signed) int when its original type is uint32_t. The type of the promoted left operand is also the type of the result, and if it is a signed type that cannot represent the result value then undefined behavior results. The expression in question can certainly produce such a result under those circumstances.Sermonize
31u does not promote a in uint32_t b = a << 31u;. Try printf("%zu\n", sizeof (5 << 1LL));Explication
@JohnBollinger you and chux are right. I didn't realize the difference between usual arithmetic conversion and integer promotion (6.3.1.1(2) and footnote 58).Seethe

© 2022 - 2024 — McMap. All rights reserved.