C/C++ Endianness and tcp sockets

Asked 13/3, 2014 at 0:30 Answered 4/3, 2018 at 16:43

I have a general conceptual question about endianness and how it affects tcp socket communication with C/C++. Here's an example:

You have two servers that are communicating with tcp sockets and one uses big endian and the other little endian. If you send an integer, over the socket, from one server to the other, I understand that the byte ordering is reversed and the integer will not print what is expected. Correct? I saw somewhere (I can't find where anymore) that if you send a char over the socket endianness doesn't change the value and it prints as expected. Is this correct? If so, why? I feel like I've done this before in the past, but I could be delusional.

Could anybody clear this up for me?

Thanks.

Edit: Is it because char is only 1 byte?

Mcneely answered 13/3, 2014 at 0:30 Comment(0)

Think about the size of each data type.

An integer is typically four bytes, which you can think of as four individual bytes side by side. The endianness of an architecture determines whether the most significant byte is the first of the four bytes, or the last. A char, however, is only one byte. As I understand it, endianness does not affect the order of the bits in each byte (see the image on Wikipedia's page on Endianness).

A char, however, is only one byte, so there's no alternative order (assuming that I am correct that bits are not modified by endianness).

If you send a char over a socket, it will be one byte on both machines. If you send an int over a socket, since it's four bytes, it's possible that one machine will interpret the bytes in a different order than the other, according to the endianness. You should set up a simple way to test this and get back with some results!

Pekingese answered 13/3, 2014 at 0:38 Comment(8)

Thanks. That was exactly what I was thinking when I made the edit above. – Mcneely 13/3, 2014 at 0:39

I wish I could add more specific information, but I'm not sure how sockets handle endianness, if they do at all. – Pekingese 13/3, 2014 at 0:41

I'm certain ints get messed up, but I've been just using chars and casting them to ints after. And people in my class think I'm performing some voodoo instead of manually reversing bytes. – Mcneely 13/3, 2014 at 0:42

Also, bit order within a byte is irrelevant of architecture. 2 compilers on same machine can produce bit field order in both ways, so without documentation you are left to just try it – Haga 13/3, 2014 at 0:43

That works, but it'll be a hassle to deal with numbers greater than 255. One simple solution that I can think of is to have one machine set an int to 1, and send it to the other. The other can then respond with a byte set to 1 if the int is interpreted as 1 (same endianness), or zero otherwise. – Pekingese 13/3, 2014 at 0:44

What about breaking whatever number you want into separate chars like one char for 111 and one for 222 and then just combine them at the other end of the socket as a string to get 111222 and cast to int? – Mcneely 13/3, 2014 at 0:48

That still doesn't handle endianness - if the two machines are different, then one will expect the most significant byte first, while the other expects it last. – Pekingese 13/3, 2014 at 0:54

The typical solution is to use the functions hton and ntoh. That is, convert to network endianness and then convert back on the other side. If you always do this for multi-byte types you should be fine. – Invitatory 13/3, 2014 at 1:34

The only thing you can send over a TCP socket is bytes. You cannot send an integer over a TCP socket without first creating some byte representation for that integer. The C/C++ type, integer, can be stored in memory in whatever way the platform likes. If that just happens to be the form in which you need to send it over the TCP socket, then fine. But if it's not, then you have to convert into the form the protocol requires before you send it and into your native format after you receive it.

As a bit of a sloppy analogy, consider they way I communicate with you. My native language might be Spanish and who knows what goes on in my brain. Internally, I might represent the number three as "tres" or some weird pattern of neurons. Who knows? But when I communicate with you, I must represent the number three as "3" or "three" because that's the protocol you and I have agreed to, the English language. So unless I'm a terrible English speaker, how I internally store the number three won't affect my communication with you.

Since this group requires me to produce streams of English characters to talk to you, I must convert my internal number representations to streams of English characters. Unless I'm terrible at doing that, how I store numbers internally will not affect the streams of English characters I produce.

So unless you do foolish things, this will never matter. Since you will be sending and receiving bytes over the TCP socket, the memory format of the integer type won't matter, since you won't be sending or receiving instances of the C/C++ integer type but logical integers.

For example, if the protocol specification for the data you are sending over TCP says that you need to send a four-byte integer in little-endian format, then you should write code to do that. If the code takes your platform's endianness into consideration, that would be purely as an optimization that should not affect code behavior.

Disciple answered 13/3, 2014 at 1:30 Comment(0)

You have two servers that are communicating with tcp sockets and one uses big endian and the other little endian. If you send an integer, over the socket, from one server to the other, I understand that the byte ordering is reversed and the integer will not print what is expected.

This is a very well known problem in network communication protocols. The correct answer is to not send any integer.

You define the protocol very specified to contain, as example a 32 bit signed integer stored in big-endian ordering. Big-endian happens to be what is mostly used in network protocols.

Inside the computers you want to use, say signed long. The C standard defines unsigned long to have a minimum range. The actual storage may be very different. It would be at least 32 bits but could be more.

On the platform where you compile your code there will be macros allowing you to translate between the "internal" integer and the network 32 bit signed big-endian in the network. Examples are htonl() and ntohl(). These macros will become different code depending on which platform you are compiling for.

Racoon answered 28/11, 2017 at 21:28 Comment(0)

Byte endianness refers to the order of individual bytes in a data type of more than 1 byte (such as short, int, long, etc.)

So your assumption is correct for int (since it must be a least 16 bits, usually more nowadays). It is also often correct for char since they are usually 1 byte. But you could have chars with more than 8bits, in which case endianness matters.

Showiness answered 13/3, 2014 at 0:42 Comment(3)

Multibyte character encodings (such as UTF-8) encode a single character as a sequence of bytes, but those bytes are still independently encoded. – Manvil 13/3, 2014 at 1:7

That may be the case for UTF-8, but the issue remains with other types of encodings such as UTF-16 (see en.wikipedia.org/wiki/UTF-16#Byte_order_encoding_schemes) – Showiness 13/3, 2014 at 1:30

@Showiness UTF-16 is the only variable-length encoding that involves words greater than one byte. "Multibyte" is applied to it simply by analogy with other encodings. UCS-16 (which is limited to the BMP) and UCS-32 are not "multibyte." You have implied a relationship between multibyte-ness and endianness which does not exist. – Manvil 13/3, 2014 at 3:27

It does not matter so long as you are transferring only bytes. And you should be transferring only bytes in standard networking.strong text

Eoin answered 4/3, 2018 at 16:43 Comment(1)

Consider adding a more detailed answer. If you don't have one, consider commenting. – Laevorotatory 4/3, 2018 at 17:5

Recommended topics

Hot tags