Why bit endianness is an issue in bitfields?
Asked Answered
V

8

78

Any portable code that uses bitfields seems to distinguish between little- and big-endian platforms. See the declaration of struct iphdr in linux kernel for an example of such code. I fail to understand why bit endianness is an issue at all.

As far as I understand, bitfields are purely compiler constructs, used to facilitate bit level manipulations.

For instance, consider the following bitfield:

struct ParsedInt {
    unsigned int f1:1;
    unsigned int f2:3;
    unsigned int f3:4;
};
uint8_t i;
struct ParsedInt *d = &i;
Here, writing d->f2 is simply a compact and readable way of saying (i>>1) & (1<<4 - 1).

However, bit operations are well-defined and work regardless of the architecture. So, how come bitfields are not portable?

Vinitavinn answered 18/5, 2011 at 10:50 Comment(3)
As long as you read and write the bits there is no problem. The issue is another machine writing the bits or their position being prescribed in a standard like IP. The C standard doesn't even fixes the size of a byte. The odds that you'll actually have a problem are not that high.Divergency
Your assumption that d->f2 is the same as (i>>1)&(1<<4 - 1) is wrong. It is completely compiler-dependent. See answers below.Kamalakamaria
How Endianness Effects Bitfield Packing: mjfrazer.org/mjfrazer/bitfieldsExterritorial
W
98

By the C standard, the compiler is free to store the bit field pretty much in any random way it wants. You can never make any assumptions of where the bits are allocated. Here are just a few bit-field related things that are not specified by the C standard:

Unspecified behavior

  • The alignment of the addressable storage unit allocated to hold a bit-field (6.7.2.1).

Implementation-defined behavior

  • Whether a bit-field can straddle a storage-unit boundary (6.7.2.1).
  • The order of allocation of bit-fields within a unit (6.7.2.1).

Big/little endian is of course also implementation-defined. This means that your struct could be allocated in the following ways (assuming 16 bit ints):

PADDING : 8
f1 : 1
f2 : 3
f3 : 4

or

PADDING : 8
f3 : 4
f2 : 3
f1 : 1

or

f1 : 1
f2 : 3
f3 : 4
PADDING : 8

or

f3 : 4
f2 : 3
f1 : 1
PADDING : 8

Which one applies? Take a guess, or read in-depth backend documentation of your compiler. Add the complexity of 32-bit integers, in big- or little endian, to this. Then add the fact that the compiler is allowed to add any number of padding bytes anywhere inside your bit field, because it is treated as a struct (it can't add padding at the very beginning of the struct, but everywhere else).

And then I haven't even mentioned what happens if you use plain "int" as bit-field type = implementation-defined behavior, or if you use any other type than (unsigned) int = implementation-defined behavior.

So to answer the question, there is no such thing as portable bit-field code, because the C standard is extremely vague with how bit fields should be implemented. The only thing bit-fields can be trusted with is to be chunks of boolean values, where the programmer isn't concerned of the location of the bits in memory.

The only portable solution is to use the bit-wise operators instead of bit fields. The generated machine code will be exactly the same, but deterministic. Bit-wise operators are 100% portable on any C compiler for any system.

Woodland answered 18/5, 2011 at 11:51 Comment(11)
At the same time, bitfield are often used with a pragma to tell the compiler not to use padding (even if not efficient to do so w.r.t. CPU's required alignment), and the compiler behavior is not stupid. Result for both reasons above: there is only 2 cases left, one for big endian machine and one for little endian. That's why you get only 2 versions in a low-level header file.Regain
@Regain But why would you want two versions of a completely non-portable file, when you could have one version of a 100% portable file? Either case results in the same machine code.Woodland
@Lundin, you are right. It's a question of focus. Compare struct iphdr s; s.version = 2; s.ihl = 3; to uint8_t s[]; s[0] = (uint8_t)((3<<3)|(2<<0));. The former is obvious, both from the code writer and the code consumer, the later is fully opaque because the code consumer must know the memory layout (did you spot the bug ?). Sure you can write a function that'll set either of these field (or both). But you'll have to write a lot of code, that will likely never be used and is error prone, ending in (useless) code bloat and complexity (if the interface is too large to remember)Regain
@Regain The problem with your code is not the bit-wise operators but the use of "magic numbers". It should have been written as s[0] = VERSION | IHL;. In theory bit-fields is a good idea, but the C standard completely fails to support them. In my experience, code which is using bit fields is far more bug prone, because the programmer using them always make a lot of implicit assumptions about the bit field, which are not at all guaranteed in practice.Woodland
@Woodland IHL might not be a fixed number (can be a 6 bit wide value), then you'll have to remember the "shift amount" somehow (yes, it can be a macro "IHL_SHIFT"). It "solves" storing, but reading would have to be done with a mask & shift, and this is complex (IMHO, much more than accessing s.ihl directly). If you have to do this once or two in your lifetime, then you can accept the effort. If you have to use the structure every day, one out of many other, using direct member is easier, it just has to be written once correctly.Regain
@Regain On the contrary, If you do this every day, like I do working with embedded programming, bit manipulations becomes really trivial stuff. You could solve your case by s[0] = VERSION | IHL_SET(val); where IHL_SET is a simple macro: #define IHL_SET(x) ((x << IHL_OFFSET) & IHL_MASK). (Mask is optional). Took me 10 seconds to write, no effort involved.Woodland
@SamGinrich Until you need to port to a different compiler for the same target... or write portable code in general.Woodland
@SamGinrich No, the aspect discussed is "Why bit endianness is an issue in bitfields?" hence the title of the question. As for compiler and platform, the gcc documentation (lacking as always) likes to not document most of it but instead refers to the ABI. Even in situations on small microcontrollers and the like where gcc states the ABI itself. Furthermore, plenty of architectures support both little and big endian at least in theory.Woodland
@SamGinrich Furthermore, neither the compiler not the architecture has anything to do with the situation where a struct is used to represent a data protocol with a certain network endianess which is documented in the data protocol but obviously not by the compiler.Woodland
@Woodland talk to yourself ;) I hinted you to a wrong statement an you insist with lots of empty proseJase
@SamGinrich Why? Did the discussion turn too tough as soon as actual technical arguments were brought in? Then maybe you should have refrained from starting it in the first place...Woodland
H
22

As far as I understand, bitfields are purely compiler constructs

And that's part of the problem. If the use of bit-fields was restricted to what the compiler 'owned', then how the compiler packed bits or ordered them would be of pretty much no concern to anyone.

However, bit-fields are probably used far more often to model constructs that are external to the compiler's domain - hardware registers, the 'wire' protocol for communications, or file format layout. These thing have strict requirements of how bits have to be laid out, and using bit-fields to model them means that you have to rely on implementation-defined and - even worse - the unspecified behavior of how the compiler will layout the bit-field.

In short, bit-fields are not specified well enough to make them useful for the situations they seem to be most commonly used for.

Harri answered 18/5, 2011 at 14:37 Comment(0)
C
11

ISO/IEC 9899: 6.7.2.1 / 10

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

It is safer to use bit shift operations instead of making any assumptions on bit field ordering or alignment when trying to write portable code, regardless of system endianness or bitness.

Also see EXP11-C. Do not apply operators expecting one type to data of an incompatible type.

Convince answered 18/5, 2011 at 12:8 Comment(0)
D
8

Bit field accesses are implemented in terms of operations on the underlying type. In the example, unsigned int. So if you have something like:

struct x {
    unsigned int a : 4;
    unsigned int b : 8;
    unsigned int c : 4;
};

When you access field b, the compiler accesses an entire unsigned int and then shifts and masks the appropriate bit range. (Well, it doesn't have to, but we can pretend that it does.)

On big endian, layout will be something like this (most significant bit first):

AAAABBBB BBBBCCCC

On little endian, layout will be like this:

BBBBAAAA CCCCBBBB

If you want to access the big endian layout from little endian or vice versa, you'll have to do some extra work. This increase in portability has a performance penalty, and since struct layout is already non-portable, language implementors went with the faster version.

This makes a lot of assumptions. Also note that sizeof(struct x) == 4 on most platforms.

Diakinesis answered 18/5, 2011 at 10:56 Comment(6)
As I wrote in the comment above, that's exactly what I don't understand. If I read this memory location into a variable of type unsigned int, its value would always be AAAABBBBBBBBBCCCC, whatever the endianness is, right? Then, if I wanted to cut the field c from it, I would do i & 0xff and it would still be portable. Why bitfields are not the same?Vinitavinn
This is not true, neither endianess nor bit order of a bit field is specified by the C standard. The compiler is free to allocate those bits wherever it want.Woodland
It sounds like you have a different expectation of portability from unsigned int and from bit fields. In both cases, in-memory structures are efficient but cannot be copied to other systems without doing some byte swapping operations.Diakinesis
@Lundin: I'm not talking about the C standard, I'm talking about implementations of the C standard.Diakinesis
could you elaborate on how you came up with BBBBAAAA CCCCBBB ?Jonson
Note this is implementation-defined behaviour, some implementations do AAAAPPPP BBBBBBBB CCCCPPPPJemie
D
2

The bit fields will be stored in a different order depending on the endian-ness of the machine, this may not matter in some cases but in other it may matter. Say for example that your ParsedInt struct represented flags in a packet sent over a network, a little endian machine and big endian machine read those flags in a different order from the transmitted byte which is obviously a problem.

Director answered 18/5, 2011 at 11:0 Comment(2)
That's exactly what I fail to understand. Consider the IP header example which I gave a link to. First 4 bits, counting from the lsb, are the version, while bits 5-8 are the length. After the NIC has decoded the frame and placed it into memory, if I read the whole byte, I will always get the same results, right? Then, if I use bit shifts and bitwise ANDs to cut the byte into nibbles, I will still get the same results, whatever the platform is. So why bitfield is not the same?Vinitavinn
@Leonid, the short answer is: because the Standard doesn't guarantee it to be the same.Convince
P
1

To echo the most salient points: If you are using this on a single compiler/HW platform as a software only construct, then endianness will not be an issue. If you are using code or data across multiple platforms OR need to match hardware bit layouts, then it IS an issue. And a lot of professional software is cross-platform, hence it has to care.

Here's the simplest example: I have code that stores numbers in binary format to disk. If I do not write and read this data to disk myself explicitly byte by byte, then it will not be the same value if read from an opposite endian system.

Concrete example:

int16_t s = 4096; // a signed 16-bit number...

Let's say my program ships with some data on the disk that I want to read in. Say I want to load it as 4096 in this case...

fread((void*)&s, 2, fp); // reading it from disk as binary...

Here I read it as a 16-bit value, not as explicit bytes. That means if my system matches the endianness stored on disk, I get 4096, and if it doesn't, I get 16 !!!!!

So the most common use of endianness is to bulk load binary numbers, and then do a bswap if you don't match. In the past, we'd store data on disk as big endian because Intel was the odd man out and provided high speed instructions to swap the bytes. Nowadays, Intel is so common that often make Little Endian the default and swap when on a big endian system.

A slower, but endian neutral approach is to do ALL I/O by bytes, i.e.:

uint_8 ubyte;
int_8 sbyte;
int16_t s; // read s in endian neutral way

// Let's choose little endian as our chosen byte order:

fread((void*)&ubyte, 1, fp); // Only read 1 byte at a time
fread((void*)&sbyte, 1, fp); // Only read 1 byte at a time

// Reconstruct s

s = ubyte | (sByte << 8);

Note that this is identical to the code you'd write to do an endian swap, but you no longer need to check the endianness. And you can use macros to make this less painful.

I used the example of stored data used by a program. The other main application mentioned is to write hardware registers, where those registers have an absolute ordering. One VERY COMMON place this comes up is with graphics. Get the endianness wrong and your red and blue color channels get reversed! Again, the issue is one of portability - you could simply adapt to a given hardware platform and graphics card, but if you want your same code to work on different machines, you must test.

Here's a classic test:

typedef union { uint_16 s; uint_8 b[2]; } EndianTest_t;

EndianTest_t test = 4096;

if (test.b[0] == 12) printf("Big Endian Detected!\n");

Note that bitfield issues exist as well but are orthogonal to endianness issues.

Polemoniaceous answered 2/4, 2018 at 18:8 Comment(0)
T
1

The endianess is important when you need to communicate the structure with the bit-fields, with an entity over which you don't have control; e.g. network communication or you need to implement some part of an OSI layer... Then you need to follow some agreed-upon protocol in which order (transmission order) the bits are transported and what they mean.

In that sense, I don't understand all the fussiness above about bit-field layout not being standardized and therefore, you should not use them; I tried to answer this in another related question and gave an example on how I use and assert bit-fields. Rolling your own bit flags is error prone and makes to code more 'fuzzy' or 'distracting away from your semantics' (for lack of a better term). You can find the example here.

Trefoil answered 8/2 at 7:50 Comment(0)
P
0

Just to point out - we've been discussing the issue of byte endianness, not bit endianness or endianness in bitfields, which crosses into the other issue:

If you are writing cross platform code, never just write out a struct as a binary object. Besides the endian byte issues described above, there can be all kinds of packing and formatting issues between compilers. The languages provide no restrictions on how a compiler may lay out structs or bitfields in actual memory, so when saving to disk, you must write each data member of a struct one at a time, preferably in a byte neutral way.

This packing impacts "bit endianness" in bitfields because different compilers might store the bitfields in a different direction, and the bit endianness impacts how they'd be extracted.

So bear in mind BOTH levels of the problem - the byte endianness impacts a computer's ability to read a single scalar value, e.g., a float, while the compiler (and build arguments) impact a program's ability to read in an aggregate structure.

What I have done in the past is to save and load a file in a neutral way and store meta-data about the way the data is laid out in memory. This allows me to use the "fast and easy" binary load path where compatible.

Polemoniaceous answered 7/3, 2019 at 16:7 Comment(1)
This looks like it should be an edit to your existing answer to add a new section. I don't think this looks like a separate answer to the question.Tare

© 2022 - 2024 — McMap. All rights reserved.