Converting 4 bytes in little endian order into an unsigned integer

Asked 29/7, 2013 at 5:4 Answered 23/6, 2015 at 1:16

I have a string of 256*4 bytes of data. These 256* 4 bytes need to be converted into 256 unsigned integers. The order in which they come is little endian, i.e. the first four bytes in the string are the little endian representation of the first integer, the next 4 bytes are the little endian representation of the next integer, and so on.

What is the best way to parse through this data and merge these bytes into unsigned integers? I know I have to use bitshift operators but I don't know in what way.

Imtiaz answered 29/7, 2013 at 5:4 Comment(13)

"but i don't know in what way" - you read up on how shifting operators work and hopefully you will instantly know how. – Krissie 29/7, 2013 at 5:9

the string is just passed through via a redirected file. The first 256*4 bytes are the little endian encodings of 256 unsigned integers. I need to convert each 4 bytes into an unsigned integer and store it in an array. What i dont know how to do is merge each set of 4 bytes into an unsigned int. – Imtiaz 29/7, 2013 at 5:10

@Imtiaz byte0 | (byte1 << CHAR_BIT) | (byte2 << 2 * CHAR_BIT) | (byte3 << 3 * CHAR_BIT)... – Krissie 29/7, 2013 at 5:10

@H2CO3 - I have read far and wide on google about how the bitshifting operators work, including the & and | operators. I am still extremely confused on how to merge 4 bytes into an unsigned int – Imtiaz 29/7, 2013 at 5:10

@Imtiaz Just like my comment above ^^ explains it. – Krissie 29/7, 2013 at 5:11

@H2CO3 - Can you explan that code a little bit? I am pretty confused by it, what is CHAR_BIT? – Imtiaz 29/7, 2013 at 5:11

@Imtiaz Googled it? (Nah...) It's a macro from <cstddef> which expands to the number of bits in a byte on your platform. – Krissie 29/7, 2013 at 5:12

@H2CO3 - sorry i am still kind of confused. Can you explain your code a little more? I really appreciate the input. How does 'or'ing all four bytes together give me an unsigned integer? – Imtiaz 29/7, 2013 at 5:14

@H2CO3 -- shifting by CHAR_BIT is incorrect. On some systems CHAR_BIT is 16. Here, the OP specifically said they were bytes, not chars. And he wants the in 4 byte multiples, that is 32bits. The correct shift is 8. – Instructive 29/7, 2013 at 5:16

@Imtiaz It doesn't OR all four bytes together. It ORs the first byte, the second byte shifted to the left 8 (or whatever) places, etc. Write it down on a piece of paper and you'll see why this works. – Krissie 29/7, 2013 at 5:17

@NitzanShaked A char is always a byte. It's just that they need not be 8 bits long. You are confusing "byte" with "octet". – Krissie 29/7, 2013 at 5:18

ahh i see, but when you shift the second, third, and fourth bytes by that much why wouldnt the data just fall off? I thought shifting maintains the amount of bits. – Imtiaz 29/7, 2013 at 5:18

@Imtiaz Ah, I see what you mean... Indeed, because of the "usual arithmetic conversions" (or whatever it's called in the Standard exactly), in an expression unsigned char << int, the char is promoted (implicitly converted) to unsigned int (or is it int? Somebody who speaks C++ better, please confirm this!), so you will be getting the expected result. – Krissie 29/7, 2013 at 5:22

Hope this helps you

unsigned int arr[256];
char ch[256*4] = "your string";
for(int i = 0,k=0;i<256*4;i+=4,k++)
{
arr[k] = ch[i]|ch[i+1]<<8|ch[i+2]<<16|ch[i+3]<<24;
}

Steddman answered 29/7, 2013 at 5:22 Comment(12)

what if host system is big endian? – Repentant 29/7, 2013 at 5:23

@thomas This approach is endian-agnostic. It does not matter what the host system's endianness is. – Milquetoast 29/7, 2013 at 5:27

@Steddman Don't believe the false positive ;) – Krissie 29/7, 2013 at 5:28

@thomas If the host system is big-endian, then this will work just like it would work on a little-endian system. (And not doing what OP wanted, and invoking undefined behavior by shifting stuff into the sign bit of the resulting integer...) – Krissie 29/7, 2013 at 5:28

This worked perfectly! and i believe it is hardware independent. Thanks so much! – Imtiaz 29/7, 2013 at 5:36

@Saksham, #1001807 – Repentant 29/7, 2013 at 5:36

@Imtiaz No, it doesn't "work perfectly". You wanted unsigned integers, this gives you signed integers, and in addition, it invokes undefined behavior. – Krissie 29/7, 2013 at 5:37

@Steddman This solution is almost correct, you should just change int to unsigned int. (Edit: done by OP, +1.) – Krissie 29/7, 2013 at 5:39

@H2CO3, then do we need to assume that integer to char conversion is done in a similar bit shift operation? – Repentant 29/7, 2013 at 6:4

@thomas Sorry, I don't understand, what do you mean by that? Here we don't need to assume anything - as you can see in the code, the conversion is done using the required bitwise operations. – Krissie 29/7, 2013 at 6:41

@H2CO3, on the sender side, we have an contiguous array of unsigned integers and during conversion to char pointers, if we just cast integer array starting location to char * and send this char array then bit shift conversion may not work on the receiver side because endianness information is not handled in such a case. – Repentant 29/7, 2013 at 6:47

@thomas Huh? I think you should read the question more carefully, or maybe I am miserably misinterpreting it. "I have a string of 256*4 bytes of data. These 256* 4 bytes need to be converted into 256 unsigned integers. The order in which they come is little endian." (emphasis mine) – Krissie 29/7, 2013 at 6:49

Alternatively, we can use C/C++ casting to interpret a char buffer as an array of unsigned int. This can help get away with shifting and endianness dependency.

#include <stdio.h>
int main()
{
    char buf[256*4] = "abcd";
    unsigned int *p_int = ( unsigned int * )buf;
    unsigned short idx = 0;
    unsigned int val = 0;
    for( idx = 0; idx < 256; idx++ )
    {
        val = *p_int++;
        printf( "idx = %d, val = %d \n", idx, val );
    }
}

This would print out 256 values, the first one is idx = 0, val = 1684234849 (and all remaining numbers = 0).

As a side note, "abcd" converts to 1684234849 because it's run on X86 (Little Endian), in which "abcd" is 0x64636261 (with 'a' is 0x61, and 'd' is 0x64 - in Little Endian, the LSB is in the smallest address). So 0x64636261 = 1684234849.

Note also, if using C++, reinterpret_cast should be used in this case:

const char *p_buf = "abcd";
const unsigned int *p_int = reinterpret_cast< const unsigned int * >( p_buf );

Hydrolysis answered 23/6, 2015 at 1:16 Comment(0)

If your host system is little-endian, just read along 4 bytes, shift properly and copy them to int

char bytes[4] = "....";
int i = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);

If your host is big-endian, do the same and reverse the bytes in the int, or reverse it on-the-fly while copying with bit-shifting, i.e. just change the indexes of bytes[] from 0-3 to 3-0

But you shouldn't even do that just copy the whole char array to the int array if your PC is in little-endian

#define LEN 256
char bytes[LEN*4] = "blahblahblah";
unsigned int uint[LEN];
memcpy(uint, bytes, sizeof bytes);

That said, the best way is to avoid copying at all and use the same array for both types

union
{
    char bytes[LEN*4];
    unsigned int uint[LEN];
} myArrays;

// copy data to myArrays.bytes[], do something with those bytes if necessary
// after populating myArrays.bytes[], get the ints by myArrays.uint[i]

Heterogony answered 29/7, 2013 at 6:34 Comment(0)

Recommended topics

Hot tags