What is the advantage of uint8_t over unsigned char?
Asked Answered
C

8

295

What is the advantage of using uint8_t over unsigned char in C?

I know that on almost every system uint8_t is just a typedef for unsigned char, so why use it?

Crumby answered 12/11, 2009 at 22:28 Comment(0)
F
287

It documents your intent - you will be storing small numbers, rather than a character.

Also it looks nicer if you're using other typedefs such as uint16_t or int32_t.

Ferneferneau answered 12/11, 2009 at 22:31 Comment(10)
Explicitly using unsigned char or signed char documents the intent too, since unadorned char is what shows you're working with characters.Niall
@caf: If you're lucky enough to get beyond an unadorned 'unsigned' to begin with, which I still see people doing to let the platform pick if its int or char by default. But, I think, in this day and age 'unsigned' (alone, or adorned) indicates the intent adequately, otherwise a simple process of elimination explains it :)Floristic
I thought an unadorned unsigned was unsigned int by definition?Ferneferneau
@endolith, using uint8_t for a string isn't necessarily wrong, but it's definitely weird.Ferneferneau
So a uint8_t can hold an integer value between 0 - 255 or 8 binary bits. Just like the when the syntax was unsigned char, but much better grammar.Allometry
@NickSoft, the question wasn't about those other types so I didn't get into that. And unfortunately unsigned char and uint8_t aren't distinct types, see for example ideone.com/GMV0uDFerneferneau
hmm. I'm sorry for giving wrong information... I knew that enum and int are distrinct types and I assumed that it's the same for other types that can be auto-casted to int. Or maybe it depends on compiller...Phenacetin
@endolith, I think I can make a case for uint8_t with UTF8 text. Indeed, char seems to imply a character, whereas in the context of a UTF8 string, it may be just one byte of a multibyte character. Using uint8_t could make it clear that one shouldn't expect a character at every position -- in other words that each element of the string/array is an arbitrary integer that one shouldn't make any semantic assumptions about. Of course all C programmers know this, but it may push beginners to ask the right questions.Sturgeon
I have to say, unsigned char isn't really used to store characters in the first place, so the "intent" issue is moot.Yaekoyael
Well, that's historical. I think we can assume that it was used to store characters "in the first place" (original intent; that char is an abbreviation of character is fairly unambiguous), but indeed doesn't in practice because it was historically the only standard 8-bit datatype until C99 inttypes.h appeared. Now that we have inttypes.h, I feel it's in fact all about intent when comparing the original datatypes and the newer (u)int_(least/fast)N_t datatypes, and about intent and assurance that the code either compiles with exact width or not at all when it comes to (u)intN_t.Sturgeon
A
92

Just to be pedantic, some systems may not have an 8 bit type. According to Wikipedia:

An implementation is required to define exact-width integer types for N = 8, 16, 32, or 64 if and only if it has any type that meets the requirements. It is not required to define them for any other N, even if it supports the appropriate types.

So uint8_t isn't guaranteed to exist, though it will for all platforms where 8 bits = 1 byte. Some embedded platforms may be different, but that's getting very rare. Some systems may define char types to be 16 bits, in which case there probably won't be an 8-bit type of any kind.

Other than that (minor) issue, @Mark Ransom's answer is the best in my opinion. Use the one that most clearly shows what you're using the data for.

Also, I'm assuming you meant uint8_t (the standard typedef from C99 provided in the stdint.h header) rather than uint_8 (not part of any standard).

Antitype answered 12/11, 2009 at 22:36 Comment(8)
DSPs with CHAR_BIT > 8 are becoming less rare, not more.Niall
@caf, out of sheer curiosity - can you link to description of some? I know they exist because someone mentioned one (and linked to developer docs for it) in a comp.lang.c++.moderated discussion on whether C/C++ type guarantees are too weak, but I cannot find that thread anymore, and it's always handy to reference that in any similar discussions :)Postglacial
"Some systems may define char types to be 16 bits, in which case there probably won't be an 8-bit type of any kind." - and despite some incorrect objections from me, Pavel has demonstrated in his answer that if char is 16 bits, then even if the compiler does provide an 8 bit type, it must not call it uint8_t (or typedef it to that). This is because the 8bit type would have unused bits in the storage representation, which uint8_t must not have.Resentment
The SHARC architecture has 32-bit words. See en.wikipedia.org/wiki/… for details.Clarke
And TI's C5000 DSPs (which were in OMAP1 and OMAP2) are 16bit. I think for OMAP3 they went to C6000-series, with an 8bit char.Resentment
Oh yes, it was indeed SHARC. Thanks. Looks like a perfect platform for B (the one between BCPL and C) to me :)Postglacial
Digging into N3242 - "Working Draft, Standard for Programming Language C++", section 18.4.1 < cstdint > synopsis says - typedef unsigned integer type uint8_t; // optional So, in essence, a C++ standard conforming library is not needed to define uint8_t at all (see the comment //optional)Goral
In cases where the smallest data type is greater than 8 bits (e.g. Ti's C2000-series they are 16-bits) I believe one could use uint_least8_t to properly indicate the intent and the fact that the type may not actually be 8-bits.Kalimantan
R
65

The whole point is to write implementation-independent code. unsigned char is not guaranteed to be an 8-bit type. uint8_t is (if available).

Rascally answered 12/11, 2009 at 22:55 Comment(11)
...if it exists on a system, but that's going to be very rare. +1Antitype
well if you really had trouble with your code not compiling on a system because uint8_t didn't exist, you could use find and sed to automatically change all occurences of uint8_t to unsigned char or something more useful to you.Polycythemia
@Polycythemia - not if you are assuming it is an 8-bit type you can't - for example to unpack data packaged in bytewise fashion by a remote system. The implicit assumption is that the reason for uint8_t to not exist is on a processor where a char is more than 8 bits.Later
throw in assertion assert(sizeof(unsigned char) == 8);Polycythemia
@Polycythemia incorrect assertion I'm afraid. sizeof(unsigned char) will return 1 for 1 byte. but if a system char and int are the same size of, for e.g., 16-bits then sizeof(int) will also return 1Kalimantan
OK .. anybody have a solution?Polycythemia
@Polycythemia #if CHAR_BIT == 8 or #ifdef UINT8_MAXSurfeit
@jwd: That's false. uint8_t is guaranteed to be a precisely 8-bit type. What is not guaranteed is that whether this type is available. But if it is available, then it is exactly 8-bit wide. It is true that char is not guaranteed to be 8-bit wide, but uint8_t has nothing to do with char.Rascally
Oh, right you are; I learned something (: I'll delete my comment in a bit just so nobody is misled by it by accident.Tiv
much rather have a compiler error explicitly revealing that the unsigned 8-bit integer you were expecting doesn't exist, than have your code choke and die later on... (unless, of course, your code doesn't rely on said chars being 8 bits, in which case of course feel free to call them chars!)Zina
@Zina - This is exactly it. It's perfectly fine to demand things of the platform, this is what uint8_t is explicitly for. If it can't provide a uint8, I want the brakes thrown immediately.Ostwald
S
10

As you said, "almost every system".

char is probably one of the less likely to change, but once you start using uint16_t and friends, using uint8_t blends better, and may even be part of a coding standard.

Stinko answered 12/11, 2009 at 22:31 Comment(0)
P
7

There's little. From portability viewpoint, char cannot be smaller than 8 bits, and nothing can be smaller than char, so if a given C implementation has an unsigned 8-bit integer type, it's going to be char. Alternatively, it may not have one at all, at which point any typedef tricks are moot.

It could be used to better document your code in a sense that it's clear that you require 8-bit bytes there and nothing else. But in practice it's a reasonable expectation virtually anywhere already (there are DSP platforms on which it's not true, but chances of your code running there is slim, and you could just as well error out using a static assert at the top of your program on such a platform).

Postglacial answered 12/11, 2009 at 22:42 Comment(17)
For the record, you could make an 8-bit type on any platform: typedef struct { unsigned i :8; } uint8_t; but you'd have to use it as uint8_t x; x.i = ... so it'd be a bit more cumbersome.Antitype
I think chars can go as low as 4 bits, below that and things fall apart a bit in the standard (there is a chance I'm wrong though).Hysteroid
@Hysteroid - No, the standard requires unsigned char to be able to hold values between 0 and 255. If you can do that in 4 bits, my hat is off to you.Antitype
"it'd be a bit more cumbersome" - cumbersome in the sense that you'd have to walk (swim, catch a plane, etc) all the way over to where the compiler writer was, slap them in the back of the head, and make them add uint8_t to the implementation. I wonder, do compilers for DSPs with 16bit chars typically implement uint8_t, or not?Resentment
@Steve, no, they don't, since there really isn't any way for them to do that. Bitfield trick does indeed work, but bitfields are very limited (you can't have arrays of them, you can't have pointers to them, etc). There's no requirement in C99 for a standard to have uint8_t at all - it must have it if and only if it has a corresponding type. It is, however, required to provide uint8_least_t, which is at least 8 bits (but can be larger).Postglacial
By the way, on a second thought, it is perhaps the most straightforward way to say "I really need 8 bits" - #include <stdint.h>, and use uint8_t. If the platform has it, it will give it to you. If the platform doesn't have it, your program will not compile, and the reason will be clear and straightforward.Postglacial
I like the logic that if uint8_t exists at all, it's going to be unsigned char anyway.Niall
"there really isn't any way for them to do that" - well, it depends how the compiler is coded. You know they're able to generate the code to do 8bit unsigned arithmetic, because of bitfields (probably normal arithmetic, plus some masking). Of course you'd have sizeof(uint8_t) == sizeof(char) even though UCHAR_MAX != 255, but that's OK, it's why types don't have to use all their storage bits. By "slap in the back of the head" I of course mean "make an impassioned but polite feature request". They're entitled to turn it down, but how confident are they that you won't resort to violence? ;-)Resentment
As for "straightforward" - it's certainly the least up-front coding effort, but as you say, for true portability you just have to use uint8_least_t and apply the modulo-256 overflow for yourself. I'm guessing you can write it so that on any vaguely optimising compiler where uint8_least_t is 8 bits, all the extra ops are elided.Resentment
"Of course you'd have sizeof(uint8_t) == sizeof(char) even though UCHAR_MAX != 255, but that's OK, it's why types don't have to use all their storage bits." - it's not okay because unsigned char is specifically required to use all storage bits fully by both ISO C and C++. See 6.2.6.1/3 (and the corresponding footnote) for C99, and 3.9.1/1 for C++03.Postglacial
It is OK. unsigned char (which in this example is 16bit) uses all bits, but AFAIK uint8_t doesn't have to. Hence uint8_t can be smaller than unsigned char in range, although obviously not in storage size. So I don't see why it should be difficult for the compiler writer to support uint8_t. It might be monstrously inefficient, but that's a separate issue.Resentment
Still no cigar, sorry: "For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits ... If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N-1), so that objects of that type shall be capable of representing values from 0 to 2^(N-1) using a pure binary representation ... The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation."Postglacial
OK, you win :-). 7.18.1.1 conspicuously doesn't say that the unsigned versions have no padding bits. But it's implied by the requirement that if you provide uint8_t then you must provide int8_t, and the lemma: if uint8_t has padding bits, then int8_t has padding bits, since they're the same width and the same storage size.Resentment
Moral of the story: integer types are stupid, albeit fast. If you need arithmetic modulo any particular power of two, either write it yourself or use a POSIX-compliant implementation, where uint8_t is compulsory ;-)Resentment
If you just need arithmetic modulo, unsigned bitfield will do just fine (if inconvenient). It's when you need, say, an array of octets with no padding, that's when you're SOL. Moral of the story is not to code for DSPs, and stick to proper, honest-to-God 8-bit char architectures :)Postglacial
Unfortunately, while the Standard would require that if uint8_t exists, then unsigned char must also be 8 bits, it would not forbid an implementation from making uint8_t an 8-bit extended integer type. It would be genuinely useful to have an 8-bit unsigned type which doesn't receive the special aliasing treatment given to unsigned char, and nothing would forbid an implementation from making uint8_t be such a type [IMHO, the proper way to define such a type would be to give it a special name which could be aliased to uint8_t on implementations that support the latter...Plyler
...but don't have a non-aliasing 8-bit type].Plyler
B
7

In my experience there are two places where we want to use uint8_t to mean 8 bits (and uint16_t, etc) and where we can have fields smaller than 8 bits. Both places are where space matters and we often need to look at a raw dump of the data when debugging and need to be able to quickly determine what it represents.

The first is in RF protocols, especially in narrow-band systems. In this environment we may need to pack as much information as we can into a single message. The second is in flash storage where we may have very limited space (such as in embedded systems). In both cases we can use a packed data structure in which the compiler will take care of the packing and unpacking for us:

#pragma pack(1)
typedef struct {
  uint8_t    flag1:1;
  uint8_t    flag2:1;
  padding1   reserved:6;  /* not necessary but makes this struct more readable */
  uint32_t   sequence_no;
  uint8_t    data[8];
  uint32_t   crc32;
} s_mypacket __attribute__((packed));
#pragma pack()

Which method you use depends on your compiler. You may also need to support several different compilers with the same header files. This happens in embedded systems where devices and servers can be completely different - for example you may have an ARM device that communicates with an x86 Linux server.

There are a few caveats with using packed structures. The biggest gotcha is that you must avoid dereferencing the address of a member. On systems with mutibyte aligned words, this can result in a misaligned exception - and a coredump.

Some folks will also worry about performance and argue that using these packed structures will slow down your system. It is true that, behind the scenes, the compiler adds code to access the unaligned data members. You can see that by looking at the assembly code in your IDE.

But since packed structures are most useful for communication and data storage then the data can be extracted into a non-packed representation when working with it in memory. Normally we do not need to be working with the entire data packet in memory anyway.

Here is some relevant discussion:

pragma pack(1) nor __attribute__ ((aligned (1))) works

Is gcc's __attribute__((packed)) / #pragma pack unsafe?

http://solidsmoke.blogspot.ca/2010/07/woes-of-structure-packing-pragma-pack.html

Bushweller answered 3/3, 2014 at 16:20 Comment(0)
F
4

That is really important for example when you are writing a network analyzer. packet headers are defined by the protocol specification, not by the way a particular platform's C compiler works.

Ferritin answered 1/3, 2010 at 18:49 Comment(1)
back when I asked this I was definint a simple protocol for communticaion over serial.Crumby
D
2

On almost every system I've met uint8_t == unsigned char, but this is not guaranteed by the C standard. If you are trying to write portable code and it matters exactly what size the memory is, use uint8_t. Otherwise use unsigned char.

Downtrend answered 12/11, 2009 at 22:32 Comment(3)
uint8_t always matches range and size of unsigned char and padding (none) when unsigned char is 8-bit. When unsigned char is not 8-bit, uint8_t does not exist.Surfeit
@chux, Do you have a reference to the exact place in the standard where it says that? If unsigned char is 8-bit, is uint8_t guaranteed to be a typedef thereof and not a typedef of an extended unsigned integer type?Boorman
@Boorman "exact place in the standard where it says that?" --> No - yet look to 7.20.1.1. It is readily deduced as unsigned char/signed char/char are the smallest type - no smaller than 8 bits. unsigned char has no padding. For uint8_t to be, it must be 8-bits, no padding, exist because of an implementation provided integer type: matching the minimal requirements of unsigned char. As to "... guaranteed to be a typedef..." looks like a good question to post.Surfeit

© 2022 - 2024 — McMap. All rights reserved.