Is comparing an underflowed, unsigned integer to -1 well-defined?
Asked Answered
C

2

22

Consider the following:

size_t r = 0;
r--;
const bool result = (r == -1);

Does the comparison whose result initialises result have well-defined behaviour?
And is its result true, as I'd expect?


This Q&A was written because I was unsure of two factors in particular.
They may both be identified by use of the term "crucial[ly]" in my answer.

This example is inspired by an approach for loop conditions when the counter is unsigned:
for (size_t r = m.size() - 1; r != -1; r--)

Codex answered 10/12, 2014 at 15:58 Comment(6)
Another quibble: Strictly speaking, the behavior of r-- is not an "underflow", though it's reasonable to refer to it that way informally.Shinshina
It took me a moment to remember what "underflow" does mean (when a number is too small to store as non-zero in a floating-point type). I suppose a better term for this might be "negative overflow".Gehman
@mwfearnley: Maybe, except that's arguably even worse since the standard takes the time (in a note) to explain that unsigned overflow is impossible. Granted, by the same logic unsigned underflow is impossible, but at least it doesn't say so explicitly. ;) Yeah, I've got nothing.Codex
Implementation dependent. The machine might be ones compliment.Lympho
Well I always use: const bool result = (r == (size_t)-1);. Just to be sure compiles interprets -1 as size_t and not r as some other type variable. But your option might be correct as well, just need take a look at standard.Swish
@ST3: You mean, like both answerers already did, 18 hours ago? :)Codex
S
20
size_t r = 0;
r--;
const bool result = (r == -1);

Strictly speaking, the value of result is implementation-defined. In practice, it's almost certain to be true; I'd be surprised if there were an implementation where it's false.

The value of r after r-- is the value of SIZE_MAX, a macro defined in <stddef.h> / <cstddef>.

For the comparison r == -1, the usual arithmetic conversions are performed on both operands. The first step in the usual arithmetic conversions is to apply the integral promotions to both operands.

r is of type size_t, an implementation-defined unsigned integer type. -1 is an expression of type int.

On most systems, size_t is at least as wide as int. On such systems, the integral promotions cause the value of r either to be converted to unsigned int or to keep its existing type (the former can happen if size_t has the same width as int, but a lower conversion rank). Now the left operand (which is unsigned) has at least the rank of the right operand (which is signed). The right operand is converted to the type of the left operand. This conversion yields the same value as r, and so the equality comparison yields true.

That's the "normal" case.

Suppose we have an implementation where size_t is 16 bits (let's say it's a typedef for unsigned short) and int is 32 bits. So SIZE_MAX == 65535 and INT_MAX == 2147483647. Or we could have a 32-bit size_t and a 64-bit int. I doubt that any such implementation exists, but nothing in the standard forbids it (see below).

Now the left side of the comparison has type size_t and value 65535. Since signed int can represent all the values of type size_t, the integral promotions convert the value to 65535 of type int. Both side of the == operator have type int, so the usual arithmetic conversions have nothing to do. The expression is equivalent to 65535 == -1, which is clearly false.

As I mentioned, this kind of thing is unlikely to happen with an expression of type size_t -- but it can easily happen with narrower unsigned types. For example, if r is declared as an unsigned short, or an unsigned char, or even a plain char on a system where that type is signed, the value of result will probably be false. (I say probably because short or even unsigned char can have the same width as int, in which case result will be true.)

In practice, you can avoid the potential problem by doing the conversion explicitly rather than relying on the implementation-defined usual arithmetic conversions:

const bool result = (r == (size_t)-1);

or

const bool result = (r == SIZE_MAX);

C++11 standard references:

  • 5.10 [expr.eq] Equality operators
  • 5.9 [expr.rel] Relational operators (specifies that the usual arithmetic conversions are performed)
  • 5 [expr] Expressions, paragraph 9: Usual arithmetic conversions
  • 4.5 [conv.prom] Integral promotions
  • 18.2 [support.types] size_t

18.2 paragraphs 6-7:

6 The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.

7 [ Note: It is recommended that implementations choose types for ptrdiff_t and size_t whose integer conversion ranks (4.13) are no greater than that of signed long int unless a larger size is necessary to contain all the possible values. — end note ]

So there's no prohibition on making size_t narrower than int. I can almost plausibly imagine a system where int is 64 bits, but no single object can be bigger than 232-1 bytes so size_t is 32 bits.

Shinshina answered 10/12, 2014 at 16:46 Comment(11)
Mmmm yeah almost I suppose.Codex
64 bit ALU with 32 bits of addressable memory, or a page based memory system with 32 bit pages? A special purpose streaming processing chip of some kind?Samala
@Yakk: Sure -- or a charmingly quirky hardware designer.Shinshina
@KeithThompson: I really wish the C standards committee would specify a means of declaring "wrapping" unsigned types which would not be implicitly convertible nor promotable to signed types [the sum or product of e.g. a wrap32_t and an int would be a wrap32_t regardless of whether int was 16, 32, or 64 bits]. Such types would make it possible to write clean code which was genuinely portable, and ensure that any typecasts which would be required for portability would be required, period. Without such a language feature...Connected
...I don't see any practical way to migrate code which assumes that uint32_t won't be promoted to a signed type [a very common assumption] to any system where int is bigger than 32 bits.Connected
@supercat: And/or the C++ standard committee (this question is tagged C++, not C). I agree, though I haven't thought about how to define it. There are a number of useful integer types: size_t, ptrdiff_t, uintN_t, etc, that ideally should work together in well defined ways. The fact that the integral promotions are defined so strongly in terms of int and unsigned int, which have an unspecified relationship to the other types, is annoying. The idea is that integer arithmetic on types narrower than int needn't be supported (which is convenient for a lot of hardware), but ...Shinshina
... I wish there were a cleaner way to do that.Shinshina
@KeithThompson: Given the code (for an 8- or 16-bit micro) uint16_t new_reading,last_reading; uint32_t total_consumption;, having the statement total_consumption += new_reading-last_reading; evaluate the intermediate result as a uint16_t would be more expensive on some platforms than evaluating it as uint32_t, but the execution time to execute semantically-broken code is irrelevant. If code needs to have intermediate result computed mod 65536, the compiler should add any extra instructions necessary to achieve that.Connected
This machine used to exist: ((size_t)0 - 1) == 0xFFFF, -1 == 0xFFFE.Lympho
@Joshua: 16-bit size_t, one's-complement representation for signed integers, right?Shinshina
@Lympho Regardless of the underlying representation for signed integers, converting to unsigned is always modulo 2^n.Kylakylah
C
13

Yes, and the result is what you would expect.

Let's break it down.

What is the value of r at this point? Well, the underflow is well-defined and results in r taking on its maximum value by the time the comparison is run. std::size_t has no specific known bounds, but we can make reasonable assumptions about its range when compared to that of an int:

std::size_t is the unsigned integer type of the result of the sizeof operator. [..] std::size_t can store the maximum size of a theoretically possible object of any type (including array).

And, just to get it out of the way, the expression -1 is unary - applied to the literal 1, and has type int on any system:

[C++11: 2.14.2/2]: The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented. [..]

(I won't cite all the text that describes how applying unary - to an int results in an int, but it does.)

It's more than reasonable to suggest that, on the majority of systems, an int is not going to be able to hold std::numeric_limits<std::size_t>::max().

Now, what happens to those operands?

[C++11: 5.10/1]: The == (equal to) and the != (not equal to) operators have the same semantic restrictions, conversions, and result type as the relational operators except for their lower precedence and truth-value result. [..]

[C++11: 5.9/2]: The usual arithmetic conversions are performed on operands of arithmetic or enumeration type. [..]

Let's examine these "usual arithmetic conversions":

[C++11: 5/9]: Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result.

This pattern is called the usual arithmetic conversions, which are defined as follows:

  • If either operand is of scoped enumeration type (7.2), no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
  • If either operand is of type long double, the other shall be converted to long double`.
  • Otherwise, if either operand is double, the other shall be converted to double.
  • Otherwise, if either operand is float, the other shall be converted to float.
  • Otherwise, the integral promotions (4.5) shall be performed on both operands.59 Then the following rules shall be applied to the promoted operands:
    • If both operands have the same type, no further conversion is needed.
    • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
    • Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
    • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
    • Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

I've highlighted the passage that takes effect here and, as for why:

[C++11: 4.13/1]: Every integer type has an integer conversion rank defined as follows

  • [..]
  • The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
  • The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type.
  • [..]

All integral types, even the fixed-width ones, are composed of the standard integral types; therefore, logically, std::size_t must be unsigned long long, unsigned long, or unsigned int.

  • If std::size_t is unsigned long long, or unsigned long, then the rank of std::size_t is greater than the rank of unsigned int and, therefore, also of int.

  • If std::size_t is unsigned int, the rank of std::size_t is equal to the rank of unsigned int and, therefore, also of int.

Either way, per the usual arithmetic conversions, the signed operand is converted to the type of the unsigned operand (and, crucially, not the other way around!). Now, what does this conversion entail?

[C++11: 4.7/2]: If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]

[C++11: 4.7/3]: If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

This means that std::size_t(-1) is equivalent to std::numeric_limits<std::size_t>::max(); it's crucial that the value n in the above clause relates to the number of bits used to represent the unsigned type, not the source type. Otherwise, we'd be doing std::size_t((unsigned int)-1), which is not the same thing at all — it could be many orders of magnitude smaller than our desired value!

Indeed, now that we know the conversions are all well-defined, we can test this value:

std::cout << (std::size_t(-1) == std::numeric_limits<size_t>::max()) << '\n';
// "1"

And, just to illustrate my point from earlier, on my 64-bit system:

std::cout << std::is_same<unsigned long, std::size_t>::value << '\n';
std::cout << std::is_same<unsigned long, unsigned int>::value << '\n';
std::cout << std::hex << std::showbase
          << std::size_t(-1) << ' '
          << std::size_t(static_cast<unsigned int>(-1)) << '\n';
// "1"
// "0"
// "0xffffffffffffffff 0xffffffff"
Codex answered 10/12, 2014 at 15:58 Comment(9)
-1 is not a literal. It's the literal 1 with the unary - operator applied to it. (It's still of type int.)Shinshina
@KeithThompson: True.Codex
"It's more than reasonable to suggest ..." -- I'm not sure that's consistent with the "language-lawyer" tag.Shinshina
Well I woul expect a build failure, because -Wall -Werror or equivalent is the only way to use C++ while staying relatively sane.That
@n.m. -Wall -Wextra -pedantic? Yes. -Werror? mmm, no.Codex
Yes, -Werror. Zero warnings in my build logs, because if I decide that some warnings are OK, I must review every one of them every build, and I have better uses for my copious free time.That
@n.m.: Fine but when third-party headers cause warnings you're stuck. And I have better things to do than hack other peoples' code to work around it. I also have better things to do than fix every single unused parameter warning on a dev branch on every single iteration whilst I'm in the first stages of building a new feature.Codex
@LightnessRacesinOrbit at least gcc and clang allow us to turn off warnings using pragmas and so for third party headers I can easily turn off specific warning and that is indeed what I do and it works well.Jockstrap
Third party libs/headers are rarely a big problem in practice. I fix them, wrap them, file bug reports so that they are fixed, or use -isystem (a dirty gcc hack).That

© 2022 - 2024 — McMap. All rights reserved.