Is reading one byte at a time endianness agnostic regardless of value size?

Asked 30/5, 2019 at 20:27 Answered 30/5, 2019 at 23:1

Say I am reading and writing uint32_t values to and from a stream. If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?

In the examples here the stream is a buffer in memory called p.

static uint32_t s_read_uint32(uint8_t** p)
{
    uint32_t value;
    value  = (*p)[0];
    value |= (((uint32_t)((*p)[1])) << 8);
    value |= (((uint32_t)((*p)[2])) << 16);
    value |= (((uint32_t)((*p)[3])) << 24);
    *p += 4;
    return value;
}

static void s_write_uint32(uint8_t** p, uint32_t value)
{
    (*p)[0] = value & 0xFF;
    (*p)[1] = (value >> 8 ) & 0xFF;
    (*p)[2] = (value >> 16) & 0xFF;
    (*p)[3] = value >> 24;
    *p += 4;
}

I don't currently have access to a big-endian machine to test this out, but the idea is if each byte is written one at a time each individual byte can be independently written or read from the stream. Then the CPU can handle endianness by hiding these details behind the shifting operations. Is this true, and if not could anyone please explain why not?

Solis answered 30/5, 2019 at 20:27 Comment(6)

Your intuition is sound. – Under 30/5, 2019 at 20:30

Is it? It seems to me that (*p)[0] would refer to the least significant byte on a little endian machine but to the most significant byte on a big endian machine. – Echt 30/5, 2019 at 20:32

@500-InternalServerError That's perfectly fine. Since he's not reading p's value as a 32-bit integer, it doesn't matter whether the implementation considers it the least significant byte or most significant byte if the value were read as a 32-bit integer. – Arcane 30/5, 2019 at 20:33

Bits 0...7 of a 32-bit value always contain the 8 least sig bits of its value. Endianness refers to how those 32 bits are stored in memory. – Methodize 30/5, 2019 at 20:38

Yes, reading a byte at a time is byte-order agnostic. However your code is not byte-order agnostic, the bytes of the integer are stored and read in strictly little-endian order. – Departure 30/5, 2019 at 20:58

the endianess is resolved by any operator like >>, |. – Bouleversement 30/5, 2019 at 21:57

If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?

Yes. Your s_write_uint32() function stores the bytes of the input value in order from least significant to most significant, regardless of their order in the native representation of that value. Your s_read_uint32() correctly reverses this process, regardless of the underlying representation of uint32_t. These work because

the behavior of the shift operators (<<, >>) is defined in terms of the value of the left operand, not its representation
the & 0xff masks off all bits of the left operand but those of its least-significant byte, regardless of the value's representation (because 0xff has a matching representation), and
the |= operations just put the bytes into the result; the positions are selected, appropriately, by the preceding left shift. This might be more clear if += were used instead, but the result would be no different.

Note, however, that to some extent, you are reinventing the wheel. POSIX defines a function pair htonl() and nothl() -- supported also on many non-POSIX systems -- for dealing with byte-order issues in four-byte numbers. The idea is that when sending, everyone uses htonl() to convert from host byte order (whatever that is) to network byte order (big endian) and sends the resulting four-byte buffer. On receipt, everyone accepts four bytes into one number, then uses ntohl() to convert from network to host byte order.

Winfrid answered 30/5, 2019 at 20:51 Comment(0)

It'll work but a memcpy followed by a conditional byteswap will give you much better codegen for the write function.

#include <stdint.h>
#include <string.h>

#define LE (((char*)&(uint_least32_t){1})[0]) // little endian ? 
void byteswap(char*,size_t);

uint32_t s2_read_uint32(uint8_t** p)
{
    uint32_t value;
    memcpy(&value,*p,sizeof(value));
    if(!LE) byteswap(&value,4);
    return *p+=4, value;
}

 void s2_write_uint32(uint8_t** p, uint32_t value)
{
    memcpy(*p,&value,sizeof(value));
    if(!LE) byteswap(*p,4);
    *p+=4;
}

Gcc since the 8th series (but not clang) can eliminate this shifts on a little-endian platforms, but you should help it by restrict-qualifying the doubly-indirect pointer to the destination, or else it might think that a write to (*p)[0] can invalidate *p (uint8_t is a char type and therefore permitted to alias anything).

void s_write_uint32(uint8_t** restrict p, uint32_t value)
{
    (*p)[0] = value & 0xFF;
    (*p)[1] = (value >> 8 ) & 0xFF;
    (*p)[2] = (value >> 16) & 0xFF;
    (*p)[3] = value >> 24;
    *p += 4;
}

Peddling answered 30/5, 2019 at 23:1 Comment(2)

plus one for build-time definition of little-endian vs big-endian machine. That really works huh – Dissoluble 28/7 at 17:10

Thanks. Yup, with optimization on,((*(char*)&(int){1})) is a de-facto compile-time constant on modern compilers (for conditionals anyway). :) – Peddling 28/7 at 19:20

Recommended topics

Hot tags