How do you write (portably) reverse network byte order?
Asked Answered
S

3

11

Background

When designing binary file formats, it's generally recommended to write integers in network byte order. For that, there are macros like htonhl(). But for a format such as WAV, actually the little endian format is used.

Question

How do you portably write little endian values, regardless of if the CPU your code runs on is a big endian or little endian architecture? (Ideas: can the standard macros ntohl() and htonl() be used "in reverse" somehow? Or should the code just test runtime if it's running on a little or big endian CPU and choose the appropriate code path?)

So the question is not really about file formats, file formats were just an example. It could be any kind of serialization where little endian "on the wire" is required, such as a (heretic) network protocol.

Skell answered 15/4, 2013 at 6:27 Comment(3)
You have three questions there, but only two stand out.Ayotte
@user2096041, re-arranged, is it clearer?Skell
In order to find the endianness of the host system you can follow this link: #1001807Nerta
M
21

Warning: This only works on unsigned integers, because signed right shift is implementation defined and can lead to vulnerabilities (https://mcmap.net/q/459977/-right-shift-and-signed-integer)

C already provides an abstraction over the host's endianness: the number† or int†.

Producing output in a given endianness can be done portably by not trying to be clever: simply interpret the numbers as numbers and use bit shifts to extract each byte:

uint32_t value;
uint8_t lolo = (value >> 0) & 0xFF;
uint8_t lohi = (value >> 8) & 0xFF;
uint8_t hilo = (value >> 16) & 0xFF;
uint8_t hihi = (value >> 24) & 0xFF;

Then you just write the bytes in whatever order you desire.

When you are taking byte sequences with some endianness as input, you can reconstruct them in the host's endianness by again constructing numbers with bit operations:

uint32_t value = (hihi << 24)
               | (hilo << 16)
               | (lohi << 8)
               | (lolo << 0);

† Only the representations of numbers as byte sequences have endianness; numbers (i.e. quantities) don't.

Matthieu answered 15/4, 2013 at 6:32 Comment(6)
hihi, lolo, those are the best names I've seen. What do you mean quantities?Skell
@AmigableClarkKant the abstract idea of a number, as opposed to its representation.Matthieu
Ok, maybe you should link to some kind of C++ ref for number instead. Anyway, how does it help with serialization and endianness? I understand (and approve of the rest of the answer)Skell
It basically means that C++ makes knowing the host's endianness irrelevant. You don't need to detect it, you just need to write out the bytes in the desired order.Matthieu
How would I do that with Numbers? Not that I care really, I would accept your answer for the hihi, lolo thing, but I can't accept an answer I don't understand 100%.Skell
ints, uint32_t, etc, are numbers (they don't have endianness).Matthieu
A
6

Here's a template based version:

#include <iostream>
#include <iomanip>

enum endianness_t {
   BIG,         // 0x44332211  => 0x44 0x33 0x22 0x11
   LITTLE,      // 0x44332211  => 0x11 0x22 0x33 0x44
  UNKNOWN
};

const uint32_t test_value    = 0x44332211;
const bool is_little_endian  = (((char *)&test_value)[0] == 0x11) && (((char *)&test_value)[1] == 0x22);
const bool is_big_endian     = (((char *)&test_value)[0] == 0x44) && (((char *)&test_value)[1] == 0x33);

const endianness_t endianness = 
   is_big_endian ? BIG: 
  (is_little_endian ? LITTLE : UNKNOWN);


template <typename T>
T identity(T v){
  return v;
}

// 16 bits values ------

uint16_t swap_(uint16_t v){
  return ((v & 0xFF) << 8) | ((v & 0xFF00) >> 8);
}

// 32 bits values ------

uint32_t swap_(uint32_t v){
  return ((v & 0xFF) << 24) | ((v & 0xFF00) << 8) | ((v & 0xFF0000) >> 8) | ((v & 0xFF000000) >> 24);
}

template <typename T, endianness_t HOST, endianness_t REMOTE>
 struct en_swap{
  static T conv(T v){
    return swap_(v);
  }
};

template <typename T>
struct en_swap<T, BIG, BIG>{
  static T conv(T v){
    return v;
  }
};

template <typename T>
struct en_swap<T, LITTLE, LITTLE> {
  static T conv(T v){
    return v;
  }
};

template <typename T>
T to_big(T v) {

  switch (endianness){
  case LITTLE :
    return en_swap<T,LITTLE,BIG>::conv(v);
  case BIG :
    return en_swap<T,BIG,BIG>::conv(v);
  }
}

template <typename T>
T to_little(T v) {
   switch (endianness){
   case LITTLE :
     return en_swap<T,LITTLE,LITTLE>::conv(v);
   case BIG :
     return en_swap<T,BIG,LITTLE>::conv(v);
  }
}


int main(){

  using namespace std;

  uint32_t x = 0x0ABCDEF0;
  uint32_t y = to_big(x);
  uint32_t z = to_little(x);

  cout << hex << setw(8) << setfill('0') << x << " " << y << " " << setw(8) << setfill('0') << z << endl;

}
Autotomy answered 15/4, 2013 at 7:3 Comment(0)
A
2

In fact, the MSDN functions ntohl() and htonl() are the inverse of eachother:

The htonl function converts a u_long from host to TCP/IP network byte order (which is big-endian).

The ntohl function converts a u_long from TCP/IP network order to host byte order (which is little-endian on Intel processors).

Yes, runtime detecting endianness is a very sane thing to do, and basically what any ready-to-use macro/function would do at some point anyway.

And if you want to do little-big endian conversions yourself, see answer by @R-Martinho-Fernandes.

Ayotte answered 15/4, 2013 at 6:38 Comment(3)
I disagree that detecting endianness being a very sane thing to do. You normally shouldn't need to know. Unless you're writing performance-critical code, writing endian-independent code is a better idea.Waw
@Waw when writing a crossplatform process, the same code may run on either little-endian or big-endian hardware. When the process communicates via network or files, you'd want them to talk the same language. Code can check endianness at runtime, and accommodate process output/input accordingly.Ayotte
That doesn't make sense. If you're writing proper endian-agnostic code, your code is still cross-platform and doesn't need to do any runtime (or compile-time) checks. For example, if you do: fputc((x >> 24) & 0xFF, fp) where x is a 32-bit integer, you will always be writing out the most significant byte regardless of the platform. See R. Martinho Fernandes' answer.Waw

© 2022 - 2024 — McMap. All rights reserved.