Macro for endian-independent conversion to big endian
Asked Answered
C

3

6

Is it possible to write a macro in C which takes an uint32_t and converts it to big endian representation no matter if the target system is little or big endian such that the macro can be evaluated at compile-time for a constant?

I found this question: endianness conversion, regardless of endianness, however, the answers only provide functions. My situation is that I would like a compile-time expression to be able to write something like this:

const uint32_t magic_number = BIGENDIAN32(0x12345678);
Cartilage answered 17/9, 2019 at 13:52 Comment(3)
Do you want that the macro outputs an integer that is stored always the same way in memory, ie. 0x12 in lowest address, then 0x34 in address+1, etc... ?Stole
If you need a magic number (so something that has no numerical meaning), why not declare an array of 4 bytes ?Stole
@GuillaumePetitjean Yes, that is correct. For the storage, I could of course declare it as an array, but I would also need to compare the value by an integer comparison. Besides, I started wondering if this is possible as described.Cartilage
M
4

You can use a union, which is endianess-dependent, together with bit shifts that don't depend on endianess. Run-time version:

uint32_t big_endian (uint32_t n)
{
  union
  {
    uint32_t u32;
    uint8_t  u8 [4];
  } be;

  for(size_t i=0; i<4; i++)
  {
    size_t shift = (4-1-i) * 8;
    be.u8[i] = (n >> shift) & 0xFFu;
  }
  return be.u32;
}

u8[0] will always contain the MS byte on big endian machines. However, n >> shift will grab the relevant byte portably. Notably the whole function is just overhead bloat when running on a big endian machine.

Converting this to an ugly compile-time macro would be something like this:

typedef union
{
  uint32_t u32;
  uint8_t  u8 [4];
} be_t;


#define BIG_ENDIAN(n) ( _Generic((n), uint32_t: (void)0), \
  (be_t){ .u8 = { ((n) >> 24)&0xFFu,                      \
                  ((n) >> 16)&0xFFu,                      \
                  ((n) >>  8)&0xFFu,                      \
                   (n)&0xFFu } }.u32)

The _Generic check + , operator is just for type safety and can be removed if stuck with non-standard C. The macro uses a temporary union in the form of a compound literal (outer {}), initializes the u8 array (inner {}) then returns a uint32_t value.

Trying BIG_ENDIAN(0x12345678) on little endian x86 and disassembling, I get:

mov     esi, 2018915346

2018915346 dec = 0x78563412

Martelle answered 17/9, 2019 at 14:20 Comment(10)
(gcc -O3 for PowerPC gives me hogwash though, no idea why. I only have accesses to older gcc ports < 5.0. Specifically, it spits out lis 9,0x1234 + ori 9,9,22136 which seems senseless 22136=0x5678.)Martelle
It is a nice solution, I can build it as C++, but if I build it is C, I get an error "error: initializer element is not constant" (even if I remove the _Generic check). I cannot really figure out why the compiler says that, but it does.Cartilage
@Cartilage You asked for a C solution in C11 so that would be why. This won't work at all in C++ since that language does not support union type punning.Martelle
Right, but I get the error message from gcc when building with -std=c11.Cartilage
@Cartilage Compiles just fine here, actually tried some 10 different compilers since I was curious about the strange machine code.Martelle
@phuclv Any idea why some targets load the constant in 2x16 bit chunks? Is it something about the PowerPC ISA I don't get? It should have 32 bit instructions... And same with ARM.Martelle
@Martelle those architectures have fixed 32-bit instructions, so obviously you can't store a 32-bit immediate inside a single instruction. MIPS and PPC must load the upper and lower 16 bits separately. Older ARM can only load a 12-bit immediate and therefore many compilers prefer to use a constant pool to load 32-bit immediates. Sparc and RISC-V use a 12-bit and a 20-bit immediate I thinkHymnody
@lundin It seems to be a problem with the gcc I am using for an embedded target. My bad luck, but your solution is good. Thanks.Cartilage
@Cartilage The problem might be that you place the variable at file scope. I don't think the union/compound literal go-between is regarded as a constant expression. The macro might have to be rewritten if you intend to use it as a file scope variable initializer.Martelle
@Martelle You are right, that is the problem. Thanks, it really had me puzzled.Cartilage
P
1

Consider a compound literal of a union.

#define BIGENDIAN32(x) (((union { uint8_t  u8[4]; uint32_t u32; }) \
    { {((uint32_t)(x)>>24)&255, ((uint32_t)(x)>>16)&255, \
       ((uint32_t)(x)>> 8)&255,  (uint32_t)(x)&255} }).u32) \
//    MSbyte first                                 LSByte last

int main(void) {
  const uint32_t magic_number = BIGENDIAN32(0x12345678u);
  return (int) magic_number;
}
Pathological answered 17/9, 2019 at 13:52 Comment(7)
@Martelle Hmm Certainly similar. The wrapping in _Generic and creation of be_t seems unnecessary and for me obfuscated that your answer used a compound literal. Delete this now wiki answer if you see too much commonality.Pathological
Yeah well it's because I wrote the run-time version first. The _Generic is quite needed however, you don't wanna be right shifting signed int or everything will break upon arithmetic right shift.Martelle
@Martelle True, coping with a non-unsigned x makes sense. A simply cast to uint32_t would handle the concern.Pathological
But then you create a macro which silently accepts accidental wrong types. Arrays, pointers etc.Martelle
When compared with bit-shifting, using union seems costly to me. See by yourself: godbolt.org/z/WKa98esd4.Lula
@Lula SWAP(num) needs endian detection (big, little, mod, etc.), else forms the incorrect result. I did not see optimization enable in your example, making compares moot. IAC, both have same O(). Endian performance is important with arrays of data, not this snippet. Feel open to add your idea as an answer.Pathological
@chux-ReinstateMonica You're right. My solution doesn't really solve the problem. Honestly, I'm not sure how to test with optimization enabled. I agree with you that the use case would probably be arrays of data.Lula
S
1

If you are using GCC you can do something like:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define BIGENDIAN32(_a_)    __builtin_bswap32 (_a_)
#else
#define BIGENDIAN32(_a_)    (_a_)
#endif

Note: doesn't take into account PDP endianness but you've got the idea

If you want the code to be portable to other compilers you have to replace the line:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

by one of the macro suggested by the smart contributors of this thread and define a macro for the byte inversion builtins.

Stole answered 17/9, 2019 at 14:31 Comment(1)
It is a good idea, though in the linked thread I am not sure anyone actually accomplished to determine endianness at compile-time (even though they thought they did :-) ).Cartilage

© 2022 - 2024 — McMap. All rights reserved.