Is there any reason not to use fixed width integer types (e.g. uint8_t)?
Asked Answered
D

6

64

Assuming you're using a compiler that supports C99 (or even just stdint.h), is there any reason not to use fixed-width integer types such as uint8_t?

One reason that I'm aware of is that it makes much more sense to use chars when dealing with characters instead of using (u)int8_ts, as mentioned in this question.

But if you are planning on storing a number, when would you want to use a type that you don't know how big it is? I.e. In what situation would you want to store a number in an unsigned short without knowing if it is 8, 16, or even 32 bits, instead of using a uint16t?

Following on from this, is it considered better practice to use fixed-width integers, or to use the normal integer types and just never assume anything and use sizeof wherever you need to know how many bytes they are using?

Duisburg answered 16/11, 2012 at 9:13 Comment(2)
possible duplicate of Reasons to use (or not) stdintAlarum
Also see this question Exotic architectures the standard committee cares aboutGambetta
S
47

It's actually quite common to store a number without needing to know the exact size of the type. There are plenty of quantities in my programs that I can reasonably assume won't exceed 2 billion, or enforce that they don't. But that doesn't mean I need an exact 32 bit type to store them, any type that can count to at least 2 billion is fine by me.

If you're trying to write very portable code, you must bear in mind that the fixed-width types are all optional.

On a C99 implementation where CHAR_BIT is greater than 8 there is no int8_t. The standard forbids it to exist because it would have to have padding bits, and intN_t types are defined to have no padding bits (7.18.1.1/1). uint8_t therefore also forbidden because (thanks, ouah) an implementation is not permitted to define uint8_t without int8_t.

So, in very portable code, if you need a signed type capable of holding values up to 127 then you should use one of signed char, int, int_least8_t or int_fast8_t according to whether you want to ask the compiler to make it:

  • work in C89 (signed char or int)
  • avoid surprising integer promotions in arithmetic expressions (int)
  • small (int_least8_t or signed char)
  • fast (int_fast8_t or int)

The same goes for an unsigned type up to 255, with unsigned char, unsigned int, uint_least8_t and uint_fast8_t.

If you need modulo-256 arithmetic in very portable code, then you can either take the modulus yourself, mask bits, or play games with bitfields.

In practice, most people never need to write code that portable. At the moment CHAR_BIT > 8 only comes up on special-purpose hardware, and your general-purpose code won't get used on it. Of course that could change in future, but if it does I suspect that there is so much code that makes assumptions about Posix and/or Windows (both of which guarantee CHAR_BIT == 8), that dealing with your code's non-portability will be one small part of a big effort to port code to that new platform. Any such implementation is probably going to have to worry about how to connect to the internet (which deals in octets), long before it worries how to get your code up and running :-)

If you're assuming that CHAR_BIT == 8 anyway then I don't think there's any particular reason to avoid (u)int8_t other than if you want the code to work in C89. Even in C89 it's not that difficult to find or write a version of stdint.h for a particular implementation. But if you can easily write your code to only require that the type can hold 255, rather than requiring that it can't hold 256, then you might as well avoid the dependency on CHAR_BIT == 8.

Stamper answered 16/11, 2012 at 9:38 Comment(14)
The omission of uint8_t for (CHAR_BIT>8) is an awakening. Never knew that.Wodge
@WhozCraig: Actually I've revised that. I just checked the standard, and the text forbidding padding bits is for intN_t rather than uintN_t. But the paragraph that says that if there's a type with the right characteristics then uintN_t must be defined, says that it's only required to exist if there's an 8 bit unsigned type no padding (which there isn't). So what I said it slightly wrong, I think that if CHAR_BIT > 8 then int8_t is forbidden but uint8_t is still optional.Stamper
@SteveJessop int8_t is forbidden but uint8_t is still optional. But if uint8_t is defined then int8_t is required to be also defined according to C99, 7.18.1p1.Theoretical
That would make more sense. Expulsion of int8_t makes perfect sense since you've officially ditched native processor capabilities for sign-bit support when its sitting in the middle of a minimal register.Wodge
@SteveJessop I'm confused, how can uint8_t exist if the implementation does not provide an 8 bits integer ? (if it's required to be exactly 8 bits)Kaylyn
@mux: well, uint8_t is an 8 bit integer, so if it exists then the implementation does provide an 8 bit integer. Sorry for any confusion I caused by changing my story as to when it's permitted to exist. Conversely, an implementation with CHAR_BIT > 8 could provide an 8 bit integer as an extended type, it's not just not allowed to call it uint8_t.Stamper
@SteveJessop but CHAR_BIT is the number of bits for smallest object that is not a bit-field, so this means that it would have to use a bit-field to implement it ? or is it allowed by the standard to define an extended type that is smaller than CHAR_BIT ?Kaylyn
@mux: size and width are different things for integer types. This hypothetical type would still occupy 1 char of storage (size), but only 8 (width) of the bits are actually used for value and the rest would be "padding bits".Stamper
@SteveJessop "There are plenty of quantities in my programs that I can reasonably assume won't exceed 2 billion" But int is only guaranteed by the standard to be at least 32 bits, so you can't use it reliably. And this is worse than missing exact-width types because the code will compile but fail at runtime.Keyte
@eush77: actually int is only guaranteed at least 16 bits by the C standard. long is at least 32. But I don't understand what point you're making. If I have a quantity that cannot exceed 2 billion, and a signed type guaranteed to be at least 32 bits, then I can use it reliably. As long as I'm not doing bitwise operations (and sometimes even then), it remains correct whether it's exactly 32 or actually bigger.Stamper
Oh hang on, is this an issue of long vs short billion? I am British and I was born after 1970, so when I say one billion I mean 10 to the power of 9, not 10 to the power of 12.Stamper
@SteveJessop Sorry, that's a typo. I meant that int is guaranteed to be at least 16 bits, so it can't be used reliably for quantities that can exceed 32767, and storing 2 billion in int could overflow on some platforms.Keyte
@eush77: but why do you mention this? I didn't say it could.Stamper
@SteveJessop Indeed. My bad.Keyte
P
18

One issue that hasn't yet been mentioned is that while the use of fixed-size integer types will mean that the sizes of one's variables won't change if compilers use different sizes for int, long, and so forth, it won't necessarily guarantee that code will behave identically on machines with various integer sizes, even when the sizes are defined.

For example, given declaration uint32_t i;, the behavior of expression (i-1) > 5 when i is zero will vary depending upon whether a uint32_t is smaller than int. On systems where e.g. int is 64 bits (and uint32_t is something like long short), the variable i would get promoted to int; the subtraction and comparison would be performed as signed (-1 is less than 5). On systems where int is 32 bits, the subtraction and comparison would be performed as unsigned int (the subtraction would yield a really big number, which is greater than five).

I don't know how much code relies upon the fact that intermediate results of expressions involving unsigned types are required to wrap even in the absence of typecasts (IMHO, if wrapping behavior was desired, the programmer should have included a typecast) (uint32_t)(i-1) > 5) but the standard presently allows no leeway. I wonder what problems would be posed if a rule that at least permitted a compiler to promote operands to a longer integer type in the absence of typecasts or type coercions [e.g. given uint32_t i,j, an assignment like j = (i+=1) >> 1; would be required to chop off the overflow, as would j = (uint32_t)(i+1) >> 1;, but j = (i+1)>>1 would not]? Or, for that matter, how hard it would be for compiler manufacturers to guarantee that any integral-type expression whose intermediate results could all fit within the largest signed type and didn't involve right shifts by non-constant amounts, would yield the same results as if all calculations were performed on that type? It seems rather icky to me that on a machine where int is 32 bits:

  uint64_t a,b,c;
  ...
  a &= ~0x40000000;
  b &= ~0x80000000;
  c &= ~0x100000000;

clears one bit each of a and c, but clears the top 33 bits of b; most compilers will give no hint that anything is 'different' about the second expression.

Patriarchate answered 16/11, 2012 at 20:49 Comment(0)
T
8

It is true that the width of a standard integer type may change from one platform to another but not its minimal width.

For example the C Standard specifies that an int is at least 16-bit and a long is at least 32-bit wide.

If you don't have some size constraint when storing your objects you can let this to the implementation. For example if your maximum signed value will fit in a 16-bit you can just use an int. You then let the implementation have the final word of what it is the natural int width for the architecture the implementation is targeting.

Theoretical answered 16/11, 2012 at 9:38 Comment(0)
C
4

You should only use the fixed width types when you make an assumption about the width.

uint8_t and unsigned char are the same on most platforms, but not on all. Using uint8_t emphasizes on the fact that you suppose an architecture with 8 bit char and wouldn't compile on others, so this is a feature.

Otherwise I'd use the "semantic" typedef such as size_t, uintptr_t, ptrdiff_t because they reflect much better what you have in mind with data. I almost never use the base types directly, int only for error returns, and I don't remember ever having used short.

Edit: After careful reading of C11 I conclude that uint8_t, if it exists, must be unsigned char and can't be just char even if that type is is unsigned. This comes from the requirement in 7.20.1 p1 that all intN_t and uintN_t must be the corresponding signed and unsigned types. The only such pair for character types are signed char and unsigned char.

Certain answered 16/11, 2012 at 9:40 Comment(5)
I think on platforms where uint8_t is defined, uint8_t is always unsigned char (or char assuming char is unsigned).Theoretical
A system where char is bigger than 8 bits won't define uint8_t. And uint8_t could be either unsigned char or plain char (if plain char is unsigned and CHAR_BIT==8).Hummer
@KeithThompson, exactly (for the first part). So platforms that wouldn't define uint8_t wouldn't compile such code, that is what I said. I view this as a feature, if there is actually an assumption in the code that need CHAR_BIT to be 8. For the second part, do you have examples of crazy enough architectures that typedef uint8_t to char?Certain
@JensGustedt: Right, you already said that; I didn't read carefully enough. I don't know of any implementations that define uint8_t as plain char, (or int8_t if plain char is signed). It's simpler to use unsigned char and signed char, but using plain char would be perfectly legal.Hummer
@KeithThompson, actually no, that wouldn't be legal, please see my edit.Certain
P
4

The code should reveal to the casual reader (and the programmer him/her self) what is important. Is it just some integer or unsigned integer or even signed integer. The same goes for size. Is it really important to the algorithm that some variable is by default 16 bit? Or is that just unnecessary micromanagement and a failed attempt to optimize?

This is what makes programming an art -- to show what's important.

Pyelitis answered 16/11, 2012 at 9:48 Comment(0)
T
3

There are many reasons why one would want to use the, let's call them semantic types like int or char over the fixed-width types like uint8_t:

Matching existing API

The Standard C lib uses char* everywhere. Why confuse users (and introduce possible bugs?) by using a different type when talking to that API?

Similarly, printf() format strings are defined in terms of these semantic types. If you want to print a fixed-size type, you need macros like PRIu64 etc. in stdint.h to let you get the right format string for printing a fixed-size type with the old printf format strings.

Speed

The semantic types are usually chosen so that they work best for the current CPU's performance characteristics. They may be bumped to slightly larger sizes than what you would have picked because that's the register size on your CPU and will save you unnecessary conversion etc.

Now these days this is a bit of a contested answer ... that was the original intent, but due to stdint not being available in early C/C++, many platforms (like 32-bit Windows or macOS X) just guaranteed the size of int and long. So during the 64-bit move, some of those sizes stayed the same (leading to fun new types like long long, among other things). That's why we got the least and fast types.

Code Portability

The semantic types may be larger on a 64-bit platform than on a 32-bit platform (e.g. to permit array indexes to fill all of the memory). So if you're running on different platforms, using a semantic type (which by my definition would include size_t where available) instead of a fixed one means you're taking advantage of the better hardware and not adding arbitrary limits.

Of course, this makes only your algorithm portable. If you need to serialize data to bytes and exchange it between different platforms, this may make your code portable, but not your network packets or output files. So for that case, you would actually want to stick to the fixed types so data stays portable, at the cost of your code running unbearably slow or not compiling on certain platforms.

Comment: Don't ask me why they didn't introduce format strings for int64_t, int32_t etc. Maybe they were out of letters? Maybe too many code-bases defined their own format strings and would have broken?

Tedda answered 15/2, 2020 at 11:21 Comment(3)
There are format strings for fixed-width types: en.cppreference.com/w/c/types/integer For example, int64_t is PRId64 or PRIi64.Seng
No, those are constants you can insert into your string that map to the format strings of types of equivalent size to these types. They still map to the same format strings you'd use with int or double etc. in the end. They also make format strings very hard to read, because everything ends up like "this: " PRId32 " was an int32_t." instead of just "This: %32j was an int32_t." or so.Tedda
I mentioned those above BTW. 3rd paragraph.Tedda

© 2022 - 2024 — McMap. All rights reserved.