Is there a standard binary representation of integer data types in c++20?

Asked 4/3, 2022 at 16:1 Answered 4/3, 2022 at 17:24

I understand that with c++20 sign magnitude and one's comp are finally being phased out in favor of standardizing two's comp. (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html, and http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html) I was wondering what this meant for the implications of how much we can make assumptions about the binary representation of integers now in c++20? As I'm reading it, it seems like a lot of thought has been put into the allowed ranges, but I don't see anything that would really indicate requirements on the bit layout, nor endianness. I would thus assume that endianness is still an issue, but what about bit layout?

according to the standard, is 0b00000001 == 1 always true for an int8_t? What about 0b11111111 == -1

I understand that on nearly all practical systems, the leftmost bit will be the most significant, decreasing incrementally until the rightmost and least significant byte is reached, and all systems I've tested this on seem to use this representation, but does the standard say anything about this and any guarantees we get? Or would it be safer to use a 256 element lookup table to map each value a byte can represent to a specific bit representation explicitly if we need to know the underlying representation rather than relying on this? I'd rather not take the performance hit of a lookup if I can use the bytes directly as is, but I'd also like to make sure that my code isn't making too many assumptions as portability is important.

Alard answered 4/3, 2022 at 16:1 Comment(5)

If the use of 2s Complement is now compulsory, then what other way can -1 be represented but 0b11111111? Or, put another way, what other value could 0b111111111 represent, if not -1? That's the point of the standardization, as far as I can tell. – Arena 4/3, 2022 at 16:15

Negating a number in 2s comp means inverting all bits then adding 1. So, 00000001 -> 11111110 -> 11111111. – Arena 4/3, 2022 at 16:17

Well, as I mentioned, I'm not sure about the significance of bits as the standard doesn't seem to mention them(at least in the docs from the open std I read) and I'm also not sure that an implementation HAS to start counting from 0 at 0b00000000 rather than some offset. Those are my biggest concerns that I can think of. Really I suppose I'm asking how the object representation of an int maps to the value representation as the standard is concerned in c++20. – Alard 4/3, 2022 at 16:22

Endianness is not an issue. – Infatuate 4/3, 2022 at 21:3

@n.1.8e9-where's-my-sharem. It is in the sense that we can't blindly raw memcpy data from fundamental types buffers into buffers for peripherals. If you bit shift and mask of course it isn't an issue though. That's why I only mentioned it briefly here. I wouldn't say it's "not an issue" though. Especially when working in embedded code with lots of peripherals that need to communicate, sometimes with each other or other machines on a network with a different native byte order. – Alard 4/3, 2022 at 21:19

The sign bit is required to be the most significant bit (§[basic.fundamental]/3):

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2^N has the same value of corresponding bits in its value representation.

Things only work this way if the sign bit is what would be the MSB in an unsigned.

This also requires that (for example) uint8_t x = -1; will set x to 0b11111111 (since -1 reduced modulo 2⁸ is 255). In fact, that's used as an example in the standard:

[Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type. —end example]

As far as an offset representation goes, I believe it's considered impossible. The C++ standard refers to the C standard which requires (§6.2.6.2/1):

If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^N-1, so that objects of that type shall be capable of representing values from 0 to 2^{N - 1} using a pure binary representation;

"using a pure binary representation" is at least normally interpreted as meaning a representation like:

b_Nb_N-1b_N-2...b₂b₁b₀.

I.e., where, if you count bits from 0 through N-1, each bit represents the corresponding power of 2.

Aubry answered 4/3, 2022 at 17:24 Comment(1)

I see, "pure binary representation" seems pretty reassuring that the value representation will be as expected. It seems like, apparently, the c standard included this to specifically avoid things like gray code, which is the type of thing that concerned me, according to this post: cs.stackexchange.com/questions/11462/… I'll accept this as the best answer for now, thank you. – Alard 4/3, 2022 at 17:36

The C++20 standard requires that signed integers work as follows:

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2^N has the same value of corresponding bits in its value representation.

This is how two's complement is defined (there's even a footnote telling you that's what this means). This does not allow for the sign bit to appear anywhere except the highest bit in the value representation of the signed integer. And this does not allow for conversions to the unsigned equivalent to move that bit anywhere other than the highest bit in the value representation of the unsigned equivalent.

Two's complement means two's complement.

according to the standard, is 0b00000001 == 1 always true for an int8_t? What about 0b11111111 == -1

In terms of representation, this has been true since C++11. This is because the specific sized signed integer types were always required to be two's complement (even if signed char wasn't). Of course, these types are only optionally supported, so if you wanted maximum portability, you couldn't rely on them.

Dentist answered 4/3, 2022 at 16:50 Comment(1)

I feel that's a helpful insight to understanding this better, but my primary concern here isn't necessarily in casting signed to unsigned in the object representation(which I think is what the text you link is probably referring to) but rather the value representation. It states that the value representation is the same when performing such casting, but doesn't seem to actually state what that value representation is. – Alard 4/3, 2022 at 17:7