Reverse the order of bits in a bit array

Asked 25/11, 2015 at 7:6 Answered 25/11, 2015 at 14:37

I have a long sequence of bits stored in an array of unsigned long integers, like this

struct bit_array
{
    int size; /* nr of bits */
    unsigned long *array; /* the container that stores bits */
}

I am trying to design an algorithm to reverse the order of bits in *array. Problems:

size can be anything, i.e. not necessarily a multiple of 8 or 32 etc, so the first bit in the input array can end up at any position within the unsigned long in the output array;
the algorithm should be platform-independent, i.e. work for any sizeof(unsigned long).

Code, pseudocode, algo description etc. -- anything better than bruteforce ("bit by bit") approach is welcome.

Faraway answered 25/11, 2015 at 7:6 Comment(12)

"[T]he first bit in the input array can end up at any position within the unsigned long in the output array"? I'm not sure I understand. Wouldn't the first bit be at the first position in the first long? Don't you mean the last bit? – Locust 25/11, 2015 at 7:12

If the number of bits isn't a multiple of sizeof(long), what are the extra bits in the last long used for? Are they garbage (and can therefore be inverted), or do they have to be preserved as is? – Locust 25/11, 2015 at 7:15

I think the issue is that if there are 57 bits in the bit array, the bit number 0 needs to swap with bit number 56. However, before we can do anything, we need to know whether bit 0 of the array is stored in the MSB or LSB of element 0 of the array (or, if element 0 isn't where bit 0 is, we need to understand where bit 0 is stored). – Celloidin 25/11, 2015 at 7:17

@Thomas Padron-McCarthy, I'm sorry for bad description. Let's take an example: a long is 32 bits and we need to store, say, 49 bits. That means we need an array of 2 longs (the last 15 bits of the second long are wasted). After we reverse the bits, the first bit (position 0 in array[0]) will become the last (position 16 in array[0]). – Faraway 25/11, 2015 at 7:18

@Jonathan and Eques: Aha, it's about reversing the order! I thought it was just inverting each bit. Sorry for the misunderstanding. – Locust 25/11, 2015 at 7:18

Yes, it's about reversing the order. I will edit the question to make it clear. – Faraway 25/11, 2015 at 7:19

Why not just add an extra two fields to the structure to define the direction and number of bits to skip? Then create the procedures to access it to depend on the direction? – Bandurria 25/11, 2015 at 7:24

@Ed Heal, because that means that all other algorithms (bitwise comparison, logical operations etc) will have to be redesigned. – Faraway 25/11, 2015 at 7:29

Do you have control over the definition of this structure? And if you do, why store bits in unsigned long *, not in uint8_t *. That would factor out the platform-dependent issues. – Cornstalk 25/11, 2015 at 11:26

@DmitryGrigoryev: yes but it would probably slow down other more common operations on bit slices such as CLR, SET, OR, AND, SET, XOR... as proper alignment for the array member could no longer be assumed. – Milkman 26/11, 2015 at 14:12

Well you could pick uint32_t * for performance, but then you'll have to deal with little vs big endianness. Performance and portability don't really go together. – Cornstalk 26/11, 2015 at 15:58

<why store bits in unsigned long > Because in terms of "bits per CPU operation" this is probably the fastest choice. – Faraway 27/11, 2015 at 6:40

My favorite solution is to fill a lookup-table that does bit-reversal on a single byte (hence 256 byte entries).

You apply the table to 1 to 4 bytes of the input operand, with a swap. If the size isn't a multiple of 8, you will need to adjust by a final right shift.

This scales well to larger integers.

Example:

11 10010011 00001010 -> 01010000 11001001 11000000 -> 01 01000011 00100111

To split the number into bytes portably, you need to use bitwise masking/shifts; mapping of a struct or array of bytes onto the integer can make it more efficient.

For brute performance, you can think of mapping up to 16 bits at a time, but this doesn't look quite reasonable.

Hoe answered 25/11, 2015 at 7:15 Comment(13)

The question seems be a bit concerned with efficiency ("anything better than bruteforce" etc), and would this lookup table really be as efficient as bit operations on longs? I don't really know one way or the other, but until I see measurements I suspect that this could be very much slower. – Locust 25/11, 2015 at 7:17

Lookup in a data structure, that at best is in the highest-level cache, compared to bit operations in a register. – Locust 25/11, 2015 at 7:20

@ThomasPadron-McCarthy: Depends on the target, of course. On an 8 bit CPU the table will be much faster. Because thes often don't have single-cycle shifters for arbitrary counts and work on 8 bit data anyway. For 16 bit it depends. For 32 bit it will likely be slower, much slower if the CPU has bit-reverse instructions. – Under 25/11, 2015 at 7:20

@Olaf: Yes, probably very architecture-dependent. My mind always seems to be stuck in modern Intel desktop processors. – Locust 25/11, 2015 at 7:21

@ThomasPadron-McCarthy: There are still some other "modern" processors. x86 actually is a minority. – Under 25/11, 2015 at 7:23

@Olaf: Not in my desktop computer... But yeah, I need to get out more. – Locust 25/11, 2015 at 7:24

@ThomasPadron-McCarthy: if you provide an alternative, we can benchmark comparatively. – Hoe 25/11, 2015 at 7:29

@ThomasPadron-McCarthy: If you have a recent x86 CPU chances are you have an ARM core inside. But just look at your smartphone, router, washing machine, etc. – Under 25/11, 2015 at 7:29

@Olaf: ARM inside Intel ?? – Hoe 25/11, 2015 at 7:32

@YvesDaoust: I just wrote x86; not Intel. AMD uses an ARM code for the security system, not sure if Intel also uses ARM cores, I had to research on that, but it is well possible, as they have the licenses. – Under 25/11, 2015 at 7:36

Regardless of benchmarks it's clear to say that bitwise operations will be faster than lookups. A shift operation is 1 or 2 ops whereas lookup might be > 30. – Holiness 25/11, 2015 at 14:46

@erip: could be even more if the data is input from the keyboard. – Hoe 25/11, 2015 at 16:16

@erip: 30 cycles sounds very pessimistic for latency to L1 cache. And even then the latency might be irrelevant; an optimized loop will likely be limited primarily by bandwidth. (or maybe even by compute) – Xeroderma 25/11, 2015 at 16:47

I like the idea of lookup table. Still it's also a typical task for log(n) group bit tricks that may be very fast. Like:

unsigned long reverseOne(unsigned long x) {
  x = ((x & 0xFFFFFFFF00000000) >> 32) | ((x & 0x00000000FFFFFFFF) << 32);
  x = ((x & 0xFFFF0000FFFF0000) >> 16) | ((x & 0x0000FFFF0000FFFF) << 16);
  x = ((x & 0xFF00FF00FF00FF00) >> 8)  | ((x & 0x00FF00FF00FF00FF) << 8);
  x = ((x & 0xF0F0F0F0F0F0F0F0) >> 4)  | ((x & 0x0F0F0F0F0F0F0F0F) << 4);
  x = ((x & 0xCCCCCCCCCCCCCCCC) >> 2)  | ((x & 0x3333333333333333) << 2);
  x = ((x & 0xAAAAAAAAAAAAAAAA) >> 1)  | ((x & 0x5555555555555555) << 1);
  return x;
}

The underlying idea is that when we aim to reverse the order of some sequence we may swap the head and tail halves of this sequence and then separately reverse each of halves (which is done here by applying the same procedure recursively to each half).

Here is a more portable version supporting unsigned long widths of 4,8,16 or 32 bytes.

#include <limits.h>

#define ones32 0xFFFFFFFFUL
#if (ULONG_MAX >> 128)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96)|(x<<128)|(x<<160)|(x<<192)|(x<<224))
#define patt128 (ones32|(ones32<<32)|(ones32<<64) |(ones32<<96))
#define patt64  (ones32|(ones32<<32)|(ones32<<128)|(ones32<<160))
#define patt32  (ones32|(ones32<<64)|(ones32<<128)|(ones32<<192))
#else
#if (ULONG_MAX >> 64)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96))
#define patt64  (ones32|(ones32<<32))
#define patt32  (ones32|(ones32<<64))
#else
#if (ULONG_MAX >> 32)
#define fill32(x) (x|(x<<32))
#define patt32  (ones32)
#else
#define fill32(x) (x)
#endif
#endif
#endif

unsigned long reverseOne(unsigned long x) {
#if (ULONG_MAX >> 32)
#if (ULONG_MAX >> 64)
#if (ULONG_MAX >> 128)
  x = ((x & ~patt128) >> 128) | ((x & patt128) << 128);
#endif
  x = ((x & ~patt64) >> 64) | ((x & patt64) << 64);
#endif
  x = ((x & ~patt32) >> 32) | ((x & patt32) << 32);
#endif
  x = ((x & fill32(0xffff0000UL)) >> 16) | ((x & fill32(0x0000ffffUL)) << 16);
  x = ((x & fill32(0xff00ff00UL)) >> 8)  | ((x & fill32(0x00ff00ffUL)) << 8);
  x = ((x & fill32(0xf0f0f0f0UL)) >> 4)  | ((x & fill32(0x0f0f0f0fUL)) << 4);
  x = ((x & fill32(0xccccccccUL)) >> 2)  | ((x & fill32(0x33333333UL)) << 2);
  x = ((x & fill32(0xaaaaaaaaUL)) >> 1)  | ((x & fill32(0x55555555UL)) << 1);
  return x;
}

Gonfalon answered 25/11, 2015 at 12:8 Comment(3)

Nice one. But the size of unsigned long can vary. – Gear 25/11, 2015 at 14:5

This code is not portable and has great potential for subtle integer type bugs. You need to change unsigned long to uint64_t and also ensure that the type of all of the integer literals is uint64_t by using const uint64_t variables, instead of using magic numbers. For example, the literal 0x00000000FFFFFFFF is a 32 bit unsigned int, not a 64 bit signed int as you might have assumed, because adding lots of zeroes in the beginning of a literal will not make it a 64 bit type. > – Otology 25/11, 2015 at 15:48

Now imagine if int is 64 bit on the given system: suddenly there will integer promotion of this literal and suddenly it is now a signed type. Why do use signed literals in the first place? You almost never want signed types when doing "bit fiddling" operations. The algorithm proposed here is pretty good, but C code like this is a ticking bomb. – Otology 25/11, 2015 at 15:48

In a collection of related topics which can be found here, the bits of an individual array entry could be reversed as follows.

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{   
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero

The reversal of the entire array could be done afterwards by rearranging the individual positions.

Colis answered 25/11, 2015 at 7:14 Comment(3)

The variable names "input", "output" and "size"/"count" would make more sense. Don't just copy/paste solutions from that site, it is notorious for poor code readability. – Otology 25/11, 2015 at 7:30

Thanks for the comment; basically I just did copy & paste in order not to introduce errors. However, perhaps a selfmade implementation would have been better. – Colis 25/11, 2015 at 7:47

The final reversal is not so easy to do. – Milkman 25/11, 2015 at 8:11

You must define what is the order of bits in an unsigned long. You might assume that bit n is corresponds to array[x] & (1 << n) but this needs to be specified. If so, you need to handle the byte ordering (little or big endian) if you are going to use access the array as bytes instead of unsigned long.

I would definitely implement brute force first and measure whether the speed is an issue. No need to waste time trying to optimize this if it is not used a lot on large arrays. An optimized version can be tricky to implement correctly. If you end up trying anyway, the brute force version can be used to verify correctness on test values and benchmark the speed of the optimized version.

Milkman answered 25/11, 2015 at 8:7 Comment(0)

The fact that the size is not multiple of sizeof(long) is the hardest part of the problem. This can result in a lot of bit shifting.

But, you don't have to do that if you can introduce new struct member:

struct bit_array
{
    int size; /* nr of bits */
    int offset; /* First bit position */
    unsigned long *array; /* the container that stores bits */
}

Offset would tell you how many bits to ignore at the beginning of the array.

Then you only only have to do following steps:

Reverse array elements.
Swap bits of each element. There are many hacks for in the other answers, but your compiler might also provide intrisic functions to do it in fewer instructions (like RBIT instruction on some ARM cores).
Calculate new starting offset. This is equal to unused bits the last element had.

Timoshenko answered 25/11, 2015 at 8:47 Comment(4)

Why not use an uint8_t* for the array instead? – Otology 25/11, 2015 at 15:39

@Otology OP stated that size is not multiples of 8 either, so I don't think that helps. In fact, I think the best data type for array element should be chosen depending on the size of the CPU register on target architecture. – Timoshenko 26/11, 2015 at 8:18

You can still access the data by chunks of "CPU register size" even if the type is uint8_t. You can't do the opposite though, so on small CPUs the unsigned long will be very burdensome. – Otology 26/11, 2015 at 8:38

@Otology True, I don't think that unsigned long is an optimal data type. But with uint8_t on 32-bit CPU (for example) you'd have to take care of the alignment manually and treat last element specially if array size is not divisible by four, if you wanted to work on 4 byte chunks. – Timoshenko 26/11, 2015 at 9:2

I would split the problem into two parts.

First, I would ignore the fact that the number of used bits is not a multiple of 32. I would use one of the given methods to swap around the whole array like that.

pseudocode:

for half the longs in the array:
    take the first longword;
    take the last longword;
    swap the bits in the first longword
    swap the bits in the last longword;

    store the swapped first longword into the last location;
    store the swapped last longword into the first location;

and then fix up the fact that the first few bits (call than number n) are actually garbage bits from the end of the longs:

for all of the longs in the array:
    split the value in the leftmost n bits and the rest;
    store the leftmost n bits into the righthand part of the previous word;
    shift the rest bits to the left over n positions (making the rightmost n bits zero);
    store them back;

You could try to fold that into one pass over the whole array of course. Something like this:

for half the longs in the array:
    take the first longword;
    take the last longword;
    swap the bits in the first longword
    swap the bits in the last longword;

    split both value in the leftmost n bits and the rest;

    for the new first longword:
        store the leftmost n bits into the righthand side of the previous word;
        store the remaining bits into the first longword, shifted left;

    for the new last longword:
        remember the leftmost n bits for the next iteration;
        store the remembered leftmost n bits, combined with the remaining bits, into the last longword;

    store the swapped first longword into the last location;
    store the swapped last longword into the first location;

I'm abstracting from the edge cases here (first and last longword), and you may need to reverse the shifting direction depending on how the bits are ordered inside each longword.

Alleyway answered 25/11, 2015 at 14:37 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags