C/C++ packing signed char into int

Asked 13/3, 2010 at 4:41 Answered 13/3, 2010 at 18:42

I have need to pack four signed bytes into 32-bit integral type. this is what I came up to:

int32_t byte(int8_t c) { return (unsigned char)c; }

int pack(char c0, char c1, ...) {
  return byte(c0) | byte(c1) << 8 | ...;
}

is this a good solution? Is it portable (not in communication sense)? is there a ready-made solution, perhaps boost?

issue I am mostly concerned about is bit order when converting of negative bits from char to int. I do not know what the correct behavior should be.

Thanks

Triune answered 13/3, 2010 at 4:41 Comment(1)

How to convert a sequence of 32 char (0/1) to 32 bits (uint32_t)?, What's the fastest way to pack 32 0/1 values into the bits of a single 32-bit variable?, How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD, How to create a byte out of 8 bool values (and vice versa)? – Klarrisa 3/12, 2018 at 4:3

I liked Joey Adam's answer except for the fact that it is written with macros (which cause a real pain in many situations) and the compiler will not give you a warning if 'char' isn't 1 byte wide. This is my solution (based off Joey's).

inline uint32_t PACK(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
    return (c0 << 24) | (c1 << 16) | (c2 << 8) | c3;
}

inline uint32_t PACK(sint8_t c0, sint8_t c1, sint8_t c2, sint8_t c3) {
    return PACK((uint8_t)c0, (uint8_t)c1, (uint8_t)c2, (uint8_t)c3);
}

I've omitted casting c0->c3 to a uint32_t as the compiler should handle this for you when shifting and I used c-style casts as they will work for either c or c++ (the OP tagged as both).

Giarla answered 13/3, 2010 at 5:12 Comment(5)

In C and C++, a char always has a size of one byte. A byte may not be eight bits. – Buckskin 13/3, 2010 at 5:13

+1: For breaking up into two functions and using function instead of macro. This is an ideal case for having an inline function. – Lynlyncean 13/3, 2010 at 5:17

I'm curious why you embrace one C++ feature (inline functions) yet condemn another (improved cast operators).... C-Style casts are deprecated for a reason. Otherwise +1. – Empress 13/3, 2010 at 6:0

inline functions are also a C feature. – Yogi 13/3, 2010 at 7:51

Grant Peters: The reason that Joey Adams is explicitly casting c0 through c3 into uint32_t is that if int is capable of representing all of the values of uint8_t, the automatic promotion will be to signed int. If you want to maintain the unsigned, you need to ask for it. – Yogi 13/3, 2010 at 7:59

char isn't guaranteed to be signed or unsigned (on PowerPC Linux, char defaults to unsigned). Spread the word!

What you want is something like this macro:

#include <stdint.h> /* Needed for uint32_t and uint8_t */

#define PACK(c0, c1, c2, c3) \
    (((uint32_t)(uint8_t)(c0) << 24) | \
    ((uint32_t)(uint8_t)(c1) << 16) | \
    ((uint32_t)(uint8_t)(c2) << 8) | \
    ((uint32_t)(uint8_t)(c3)))

It's ugly mainly because it doesn't play well with C's order of operations. Also, the backslash-returns are there so this macro doesn't have to be one big long line.

Also, the reason we cast to uint8_t before casting to uint32_t is to prevent unwanted sign extension.

Runion answered 13/3, 2010 at 4:53 Comment(2)

Why are you packing c1 3 times? You could take care of the order of operations problem by putting the arguments of the macro in parenthesis.... – Empress 13/3, 2010 at 4:55

Ahh -- I see you fixed it. +1 then. – Empress 13/3, 2010 at 4:58

inline uint32_t PACK(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
    return (c0 << 24) | (c1 << 16) | (c2 << 8) | c3;
}

inline uint32_t PACK(sint8_t c0, sint8_t c1, sint8_t c2, sint8_t c3) {
    return PACK((uint8_t)c0, (uint8_t)c1, (uint8_t)c2, (uint8_t)c3);
}

I've omitted casting c0->c3 to a uint32_t as the compiler should handle this for you when shifting and I used c-style casts as they will work for either c or c++ (the OP tagged as both).

Giarla answered 13/3, 2010 at 5:12 Comment(5)

In C and C++, a char always has a size of one byte. A byte may not be eight bits. – Buckskin 13/3, 2010 at 5:13

+1: For breaking up into two functions and using function instead of macro. This is an ideal case for having an inline function. – Lynlyncean 13/3, 2010 at 5:17

I'm curious why you embrace one C++ feature (inline functions) yet condemn another (improved cast operators).... C-Style casts are deprecated for a reason. Otherwise +1. – Empress 13/3, 2010 at 6:0

inline functions are also a C feature. – Yogi 13/3, 2010 at 7:51

You can avoid casts with implicit conversions:

uint32_t pack_helper(uint32_t c0, uint32_t c1, uint32_t c2, uint32_t c3) {
    return c0 | (c1 << 8) | (c2 << 16) | (c3 << 24);
}

uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3) {
    return pack_helper(c0, c1, c2, c3);
}

The idea is that you see "convert all the parameters correctly. Shift and combine them", rather than "for each parameter, convert it correctly, shift and combine it". Not much in it, though.

Then:

template <int N>
uint8_t unpack_u(uint32_t packed) {
    // cast to avoid potential warnings for implicit narrowing conversion
    return static_cast<uint8_t>(packed >> (N*8));
}

template <int N>
int8_t unpack_s(uint32_t packed) {
    uint8_t r = unpack_u<N>(packed);
    return (r <= 127 ? r : r - 256); // thanks to caf
}

int main() {
    uint32_t x = pack(4,5,6,-7);
    std::cout << (int)unpack_u<0>(x) << "\n";
    std::cout << (int)unpack_s<1>(x) << "\n";
    std::cout << (int)unpack_u<3>(x) << "\n";
    std::cout << (int)unpack_s<3>(x) << "\n";
}

Output:

This is as portable as the uint32_t, uint8_t and int8_t types. None of them is required in C99, and the header stdint.h isn't defined in C++ or C89. If the types exist and meet the C99 requirements, though, the code will work. Of course in C the unpack functions would need a function parameter instead of a template parameter. You might prefer that in C++ too if you want to write short loops for unpacking.

To address the fact that the types are optional, you could use uint_least32_t, which is required in C99. Similarly uint_least8_t and int_least8_t. You would have to change the code of pack_helper and unpack_u:

uint_least32_t mask(uint_least32_t x) { return x & 0xFF; }

uint_least32_t pack_helper(uint_least32_t c0, uint_least32_t c1, uint_least32_t c2, uint_least32_t c3) {
    return mask(c0) | (mask(c1) << 8) | (mask(c2) << 16) | (mask(c3) << 24);
}

template <int N>
uint_least8_t unpack_u(uint_least32_t packed) {
    // cast to avoid potential warnings for implicit narrowing conversion
    return static_cast<uint_least8_t>(mask(packed >> (N*8)));
}

To be honest this is unlikely to be worth it - chances are the rest of your application is written on the assumption that int8_t etc do exist. It's a rare implementation that doesn't have an 8 bit and a 32 bit 2's complement type.

Analogy answered 13/3, 2010 at 13:52 Comment(0)

"Goodness"
IMHO, this is the best solution you're going to get for this. EDIT: though I'd use static_cast<unsigned int> instead of the C-style cast, and I'd probably not use a separate method to hide the cast....

Portability:
There is going to be no portable way to do this because nothing says char has to be eight bits, and nothing says unsigned int needs to be 4 bytes wide.

Furthermore, you're relying on endianness and therefore data pack'd on one architecture will not be usable on one with the opposite endianness.

is there a ready-made solution, perhaps boost?
Not of which I am aware.

Empress answered 13/3, 2010 at 4:45 Comment(5)

In both C and C++, a char is guaranteed to have a size of one byte. – Buckskin 13/3, 2010 at 4:58

@James McNellis: No it isn't. See here -> #882394 – Empress 13/3, 2010 at 5:0

Endianness aside, assuming chars are 8 bits and unsigned int is 32 bits, I think the union trick is fine even when strict aliasing rules are considered (since char can alias anything). I may be wrong. – Runion 13/3, 2010 at 5:1

-1 the behavior of the union is undefined in this case. Some compilers will not interpret this correctly and 'result' will not be correctly returned. You can not read from a different variable in a union to the last one that was written and be sure that it will work correctly on all platforms/compilers. THIS IS NOT PORTABLE. – Giarla 13/3, 2010 at 5:3

@BillyONeal: A char is not guaranteed to be 8 bits in size. It is guaranteed to be one byte in size, though. The sizeof operator returns the size of its operand in bytes; sizeof(char) == 1 is guaranteed by both the C and C++ standards. – Buckskin 13/3, 2010 at 5:3

This is based on Grant Peters and Joey Adams' answers, extended to show how to unpack the signed values (the unpack functions rely on the modulo rules of unsigned values in C):

(As Steve Jessop noted in comments, there is no need for separate pack_s and pack_u functions).

inline uint32_t pack(uint8_t c0, uint8_t c1, uint8_t c2, uint8_t c3)
{
    return ((uint32_t)c0 << 24) | ((uint32_t)c1 << 16) |
        ((uint32_t)c2 << 8) | (uint32_t)c3;
}

inline uint8_t unpack_c3_u(uint32_t p)
{
    return p >> 24;
}

inline uint8_t unpack_c2_u(uint32_t p)
{
    return p >> 16;
}

inline uint8_t unpack_c1_u(uint32_t p)
{
    return p >> 8;
}

inline uint8_t unpack_c0_u(uint32_t p)
{
    return p;
}

inline uint8_t unpack_c3_s(uint32_t p)
{
    int t = unpack_c3_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c2_s(uint32_t p)
{
    int t = unpack_c2_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c1_s(uint32_t p)
{
    int t = unpack_c1_u(p);
    return t <= 127 ? t : t - 256;
}

inline uint8_t unpack_c0_s(uint32_t p)
{
    int t = unpack_c0_u(p);
    return t <= 127 ? t : t - 256;
}

(These are necessary rather than simply casting back to int8_t, because the latter may cause an implementation-defined signal to be raised if the value is over 127, so it's not strictly portable).

Yogi answered 13/3, 2010 at 8:15 Comment(2)

Isn't pack_s redundant? If you call pack_u with argument expressions of type int8_t, they'll be converted automatically. – Analogy 13/3, 2010 at 13:52

And <= in the unpacks, I think. – Analogy 13/3, 2010 at 13:59

-2

You could also let the compiler do the work for you.

union packedchars {
  struct {
    char v1,v2,v3,v4;
  }
  int data;
};

packedchars value;
value.data = 0;
value.v1 = 'a';
value.v2 = 'b;

Etc.

Leukorrhea answered 13/3, 2010 at 18:42 Comment(1)

there is a comment by Mr. Peters why this approach is nonportable in this thread. – Triune 13/3, 2010 at 19:4

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags