When is uint8_t ≠ unsigned char?
Asked Answered
A

3

68

According to C and C++, CHAR_BIT >= 8.
But whenever CHAR_BIT > 8, uint8_t can't even be represented as 8 bits.
It must be larger, because CHAR_BIT is the minimum number of bits for any data type on the system.

On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?

(If the answer is different for C and C++ then I'd like to know both.)

Alpenglow answered 22/4, 2013 at 1:45 Comment(9)
I wonder if it's legal to have a char with only 7 real bits and 1 padding bit.Chaps
@Mysticial: Nope, I think all chars must have all of their representation bits participate in determining their value.Alpenglow
Or maybe a 16-bit uint8_t where 8 is real and 8 is padding. I'd shoot whoever made such an environment though. :)Chaps
The C++ standard lists it as optional. typedef signed integer type int8_t; // optionalGang
@Mysticial: Not sure that's allowed either, because the width is supposed to be exactly 8 bits. :PAlpenglow
@Mysticial: [u]int*_t is required by the standard to have no padding bits and to be twos-complement if signed.Graham
@Mysticial: Wouldn't you lose range if that happened? Meaning, you need all 8 bits to represent some char.Recruitment
@Mysticial: Such environments do exist (it's common for DSP architectures to be unable to address anything smaller than a word); in that case, uint8_t shouldn't exist at all.Pooi
Possible duplicate of uint8_t vs unsigned charHuxham
G
66

If it exists, uint8_t must always have the same width as unsigned char. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation as unsigned char; for instance, the bits could be interpreted in the opposite order. This is a silly example, but it makes more sense for int8_t, where signed char might be ones complement or sign-magnitude while int8_t is required to be twos complement.

One further "advantage" of using a non-char extended integer type for uint8_t even on "normal" systems is C's aliasing rules. Character types are allowed to alias anything, which prevents the compiler from heavily optimizing functions that use both character pointers and pointers to other types, unless the restrict keyword has been applied well. However, even if uint8_t has the exact same size and representation as unsigned char, if the implementation made it a distinct, non-character type, the aliasing rules would not apply to it, and the compiler could assume that objects of types uint8_t and int, for example, can never alias.

Graham answered 22/4, 2013 at 2:17 Comment(9)
If I'm to believe the snippet of draft standard posted in another answer, uint8_t must be defined as a typedef.Omland
typedef __uint8_t uint8_t; is a typedef.Graham
In the interest of humour, perhaps an implementation might decide to be consistent with it's naming conventions and, in contrast to long long, it might introduce a short short. Hence, typedef short short int8_t;...Filament
In 2003 ± 2 (not going to go dig it up in the mail archives right now), the GCC team contemplated making [u]int8_t special extended integer types exactly so that it could be optimized more aggressively ... but eventually rejected the notion on the grounds that programmers are very likely to expect them to have the same special aliasing properties as char. (This was around the same time we were getting screamed at by the kernel people for doing type-based alias analysis at all, so we were all a little skittish.)Gilford
@Zack: Thanks for the interesting historical note. It would be nice if gcc still provided those types, but didn't use them by default, so that a feature test macro or similar could switch to them, enabling the more aggressive optimization.Graham
@Zack interesting, well this issue popped up in a question today and I don't see a portable workaround, which is unfortunate. +1 btw.Dutra
@ShafikYaghmour: Nice question. The trivial workaround, however, is to use the restrict keyword, or to copy the pointer to a local variable whose address is never taken so that the compiler does not need to worry about whether the uint8_t objects can alias it.Graham
@R.. thank you for the suggestion, the OP posted a follow-up question and stated that restrict did not work in gcc for them but the other suggestion did.Dutra
Divorcing uint8_t from character types was actually discussed at the GCC bugzilla: see <gcc.gnu.org/bugzilla/show_bug.cgi?id=66110>.Morn
F
33

On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?

In summary, uint8_t can only be legally defined on systems where CHAR_BIT is 8. It's an addressable unit with exactly 8 value bits and no padding bits.

In detail, CHAR_BIT defines the width of the smallest addressable units, and uint8_t can't have padding bits; it can only exist when the smallest addressable unit is exactly 8 bits wide. Providing CHAR_BIT is 8, uint8_t can be defined by a type definition for any 8-bit unsigned integer type that has no padding bits.


Here's what the C11 standard draft (n1570.pdf) says:

5.2.4.2.1 Sizes of integer types 1 The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. ... Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

-- number of bits for smallest object that is not a bit-field (byte)
   CHAR_BIT                                            8

Thus the smallest objects must contain exactly CHAR_BIT bits.


6.5.3.4 The sizeof and _Alignof operators

...

4 When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. ...

Thus, those are (some of) the smallest addressable units. Obviously int8_t and uint8_t may also be considered smallest addressable units, providing they exist.

7.20.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

2 The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

3 These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

The emphasis on "These types are optional" is mine. I hope this was helpful :)

Filament answered 22/4, 2013 at 1:54 Comment(12)
So what's the purpose of uint8_t if it's never different from unsigned char?Alpenglow
@Mehrdad I guess in the case of when you actually need a int8 it would not compile at all when CHAR_BIT > 8 since int8_t wouldn't even exist. Whereas if used char and CHAR_BIT > 8, then you might get a half-broken build.Chaps
@Mysticial: Weird, couldn't you already just say #if CHAR_BIT > 8... #error ZOMG... #endif if your program is supposed to not work for those systems?Alpenglow
It is different from unsigned char. unsigned char is guaranteed to exist, but is only guaranteed to be 8 bits when CHAR_BIT == 8. uint8_t isn't guaranteed to exist, but is guaranteed to be 8 bits when it does.Filament
There's a subtle difference between char and int8_t, besides the width. A char might use ones' complement, two's complement or sign-and-magnitude representation, where a int8_t is required to use a two's complement representation.Filament
I always thought the point of all the specific-size types was so that if something weird was going on, things either kept working or broke right away and told you so. They're also far more readable, when you're not working with chars.Bengaline
It would also be good to say if char is guaranteed to have CHAR_BIT bits and quote the standard for that.Huxham
Hey, if you decide to delete you answers, do ping me so I can repost them and get rep haha But I have enough rep for job market now, I'm just doing this to save the world. What I really want now is to make money.Huxham
@Filament I thought it was not possible to retract the CC BY-SA of one's answers. But you should chill, I'm just joking.Huxham
OK, I see what you mean.Huxham
@Filament char can be unsigned, int8_t is signed.Thereafter
@Thereafter true. I can't edit that in to the comment, but let it be known that I meant to mention the possibility of char being an unsigned type.Filament
G
8

A possibility that no one has so far mentioned: if CHAR_BIT==8 and unqualified char is unsigned, which it is in some ABIs, then uint8_t could be a typedef for char instead of unsigned char. This matters at least insofar as it affects overload choice (and its evil twin, name mangling), i.e. if you were to have both foo(char) and foo(unsigned char) in scope, calling foo with an argument of type uint8_t would prefer foo(char) on such a system.

Gilford answered 24/4, 2013 at 22:30 Comment(3)
"However, it need not be the same type; it may be a distinct extended integer type." covers that in part, although it's true it might easily be overlooked.Memorialist
@LucDanton char is not an extended integer type.Gilford
"it need not be the same type" is the relevant part. I took the rest to be an example.Memorialist

© 2022 - 2024 — McMap. All rights reserved.