In C/C++ what's the simplest way to reverse the order of bits in a byte?

Asked 8/4, 2010 at 19:32 Answered 29/5, 2023 at 21:16

157

While there are multiple ways to reverse bit order in a byte, I'm curious as to what is the "simplest" for a developer to implement. And by reversing I mean:

1110 -> 0111
0010 -> 0100

This is similar to, but not a duplicate of this PHP question.

This is similar to, but not a duplicate of this C question. This question is asking for the easiest method to implement by a developer. The "Best Algorithm" is concerned with memory and cpu performance.

Tumid answered 8/4, 2010 at 19:32 Comment(5)

Use inline assembly. Better, put the function into a separate translation unit. Have one assembly language module for each target platform. Let build process choose the modules. – Cummins 8/4, 2010 at 19:47

@Andreas Simplest implementation – Tumid 8/4, 2010 at 20:12

Related: codegolf.stackexchange.com/questions/36213/… – Superpose 16/6, 2020 at 10:10

Fun fact: ARM / AArch64 have an instruction for this, rbit. I don't know of a standard way to use it, though, not even a GCC or clang intrinsic. See How can I elegantly take advantage of ARM instructions like REV and RBIT when writing C code? in case future answers have a better way. – Halvorson 10/1, 2023 at 8:24

A lookup table would probably be the simplest. – Defray 30/5, 2023 at 1:31

107

If you are talking about a single byte, a table-lookup is probably the best bet, unless for some reason you don't have 256 bytes available.

Lentissimo answered 8/4, 2010 at 19:34 Comment(15)

If we're talking about something that's simple to implement without copying a ready-made solution, creating the lookup table does still require another solution. (Of course one might do it by hand, but that's error-prone and time-consuming…) – Joerg 8/4, 2010 at 19:52

You can squeeze the array into somewhat fewer than 256 bytes if you ignore palindromes. – Selectman 8/4, 2010 at 19:56

@Joerg then write a script :p it's less time consuming than writing a solution in C++ which solves the problem at runtime. – Selectman 8/4, 2010 at 20:0

@Selectman - you'd need a table to know which ones are the palindromes. – Nupercaine 8/4, 2010 at 20:1

@wilhelmtell: Well, to write the script one still needs another solution, which was my point – a lookup table is simple to use but not simple to create. (Except by copying a ready-made lookup table, but then one might just as well copy any solution.) For example, if the “simplest” solution is considered one that could be written on paper in an exam or interview, I would not start making a lookup table by hand and making the program to do it would already include a different solution (which would be simpler alone than the one including both it and the table). – Joerg 8/4, 2010 at 20:18

Fast and easy to implement. – Duality 8/4, 2010 at 21:18

@Joerg what I meant is write a script which outputs the table of the first 256 bytes and their reverse mapping. Yes, you're back to writing the reverse function, but now in your favourite scripting language, and it can be as nasty as you want -- you're going to throw it away as soon as it's done and you ran it once. Have the script's output as C code, even: unsigned int rtable[] = {0x800, 0x4000, ...};. Then throw away the script and forget you ever had it. It's much faster to write than the equivalent C++ code, and it will only ever run once, so you get O(1) runtime in your C++ code. – Selectman 8/4, 2010 at 21:32

I would agree with wilhelmtell, here. You can even create the table in Excel. I work with embedded software, so I use Excel to create such tables all the time. Worst case, you leave a really ugly script running overnight, and then copy/paste your freshly minted table in the morning. – Lentissimo 8/4, 2010 at 22:6

@e.James: Then I would argue that the solution is in Excel or the scripting language and not in C or C++ as per the question. And if the table was generated in C, then the solution used to generate it would obviously be simpler than the solution that includes both the generator and the look-up. That's all I'm saying… – Joerg 8/4, 2010 at 22:10

@Arkku: I think I see where you are coming from. In fact, without knowing exactly what the OP meant when he asked for the "simplest implementation", I fear we could debate this point for days. I stand by my own answer, but I have upvoted yours (and several of the others) since they all provide working solutions. I'll leave it up to the OP to decide which is simplest by way of the green checkmark :) – Lentissimo 8/4, 2010 at 23:52

I like it. Simple to implement. Thanks. – Tumid 9/4, 2010 at 14:9

You can halve the size of the lookup table with a single if and a couple of subtractions, e.g. if(n < 128) { return lookup[n]; } else { return 255 - lookup[255-n]; } // assumes unsigned bytes – Giverin 9/4, 2010 at 17:24

@Grant Peters: You can cut it even smaller (down to 16 bytes) with Caspin's answer. – Lentissimo 9/4, 2010 at 17:58

@e.James, didn't see that and his would probably run faster as well – Giverin 11/4, 2010 at 7:51

@Lentissimo combine it with htonl and works for bigger numbers too. – Thyrse 30/8, 2017 at 19:48

300

This should work:

unsigned char reverse(unsigned char b) {
   b = (b & 0xF0) >> 4 | (b & 0x0F) << 4;
   b = (b & 0xCC) >> 2 | (b & 0x33) << 2;
   b = (b & 0xAA) >> 1 | (b & 0x55) << 1;
   return b;
}

First the left four bits are swapped with the right four bits. Then all adjacent pairs are swapped and then all adjacent single bits. This results in a reversed order.

Rambort answered 8/4, 2010 at 19:40 Comment(13)

Reasonably short and quick, but not simple. – Nupercaine 8/4, 2010 at 19:58

This approach also cleanly generalizes to perform byte swapping for endianness. – Commendam 8/4, 2010 at 20:25

Not the simplest approach, but I like it +1. – Tumid 9/4, 2010 at 14:10

Yes, it is simple. It's a kind of divide and conquer algorithm. Excellent! – Nemathelminth 11/5, 2010 at 4:25

Is it faster than the method suggested by @Arkku below? – Fostoria 1/11, 2013 at 16:30

This solution is excellent! I come up with swapping, but not so talent. – Atypical 16/6, 2016 at 7:49

Clever solution, but you should point out that this approach works only on targets where CHAR_BIT is an exact power of 2. – Dube 31/5, 2017 at 16:26

I wonder if a compiler would be able to recognize what that does and replace it with single assembler instruction if such is available. – Chemaram 12/3, 2019 at 19:10

This beautifully scales to word sizes of any power of two bits (i.e. 16-bit words, 32-bit words, 64-bit words, etc.), with only one additional b= line for each higher power of 2. – Demibastion 14/9, 2021 at 0:48

@KevinH.Patterson ...and also need to change the size of the hex literals (masks) according to the target word size. – Wavemeter 26/11, 2021 at 10:6

Or you could simply have a table of answers and you result would be a simple table lookup. – Defray 30/5, 2023 at 1:32

I like this solution. It is clever, but not so clever that I cannot understand how it works. – Hydrated 20/11, 2023 at 17:56

@PiotrSiupa: Yes, Clang 13 and later recognizes this algorithms SWAR divide-and-conquer as an rbit idiom for ARM and AArch64: godbolt.org/z/14MM3cExW . Also for bit-at-a-time loops, although -O3 is required for 32-bit integers, but -O2 handles 8-bit (fewer iterations to unroll and analyze.) – Halvorson 24/11, 2023 at 0:7

156

I think a lookup table has to be one of the simplest methods. However, you don't need a full lookup table.

//Index 1==0b0001 => 0b1000
//Index 7==0b0111 => 0b1110
//etc
static unsigned char lookup[16] = {
0x0, 0x8, 0x4, 0xc, 0x2, 0xa, 0x6, 0xe,
0x1, 0x9, 0x5, 0xd, 0x3, 0xb, 0x7, 0xf, };

uint8_t reverse(uint8_t n) {
   // Reverse the top and bottom nibble then swap them.
   return (lookup[n&0b1111] << 4) | lookup[n>>4];
}

// Detailed breakdown of the math
//  + lookup reverse of bottom nibble
//  |       + grab bottom nibble
//  |       |        + move bottom result into top nibble
//  |       |        |     + combine the bottom and top results 
//  |       |        |     | + lookup reverse of top nibble
//  |       |        |     | |       + grab top nibble
//  V       V        V     V V       V
// (lookup[n&0b1111] << 4) | lookup[n>>4]

This fairly simple to code and verify visually.
Ultimately this might even be faster than a full table. The bit arith is cheap and the table easily fits on a cache line.

Alluvial answered 8/4, 2010 at 20:33 Comment(12)

Nice, but will give you a cache miss. – Petiolule 3/11, 2010 at 17:42

@kotlinski: what will cause a cache miss? I think the small table version may be more cache efficient than the large one. On my Core2 a cache line is 64 bytes wide, the full table would span multiple lines, whereas the smaller table easily fits one a single line. – Alluvial 1/10, 2011 at 14:20

@deft_code: Reading from the table would likely cause a cache miss, as code and global data are not likely to be that close to each other in memory. – Petiolule 1/10, 2011 at 15:42

@kotlinski, I suppose that could be a problem, the table is small, but if it's called infrequently it'll have to cache load the table every time. An alternative implementation would be to use a switch to encode the table instead an array. Either way I don't think it will measurably effect performance. – Alluvial 1/10, 2011 at 18:31

@kotlinski: Temporal locality is more important for cache hits or replacement strategies, than address locality – Adipocere 25/9, 2013 at 16:39

What is the reasoning behind this strategy? Say I heard this question for the first time, how could I ever arrive at this curtailed version of lookup table? – Panama 2/1, 2014 at 19:43

@Harshdeep: Consider the binary encoded indexes of the table entries. index b0000(0) -> b0000(0x0) boring; b0001(1) -> b1000(0x8), b0010(2) -> b0100(0x4), b1010(10) -> b0101(0x5). See the pattern? It is simple enough that you can calculate it in your head (if you can read binary, otherwise you'll need paper to work it out). As for the leap that reversing an 8 bit integer is the same as reversing 4 bit parts then swapping them; I claim experience and intuition (or magic). – Alluvial 2/1, 2014 at 20:48

@Alluvial and your magic works :) thanks for the elaborate explanation. – Panama 6/1, 2014 at 11:58

this is more suited to embedded applications since there's no cache and memory is limited – Attack 7/5, 2014 at 12:47

Agreed @LưuVĩnhPhúc, this may be the solution with the smallest footprint. – Voracious 18/6, 2014 at 0:55

The lookup table could use a const ;) – Dinesen 18/7, 2017 at 17:5

How do you (1) generate this table (like this), and (2) how do you apply this same table-based approach to a 16-bit unsigned integer? – Formication 20/4, 2022 at 4:40

107

If you are talking about a single byte, a table-lookup is probably the best bet, unless for some reason you don't have 256 bytes available.

Lentissimo answered 8/4, 2010 at 19:34 Comment(15)

You can squeeze the array into somewhat fewer than 256 bytes if you ignore palindromes. – Selectman 8/4, 2010 at 19:56

@Joerg then write a script :p it's less time consuming than writing a solution in C++ which solves the problem at runtime. – Selectman 8/4, 2010 at 20:0

@Selectman - you'd need a table to know which ones are the palindromes. – Nupercaine 8/4, 2010 at 20:1

Fast and easy to implement. – Duality 8/4, 2010 at 21:18

I like it. Simple to implement. Thanks. – Tumid 9/4, 2010 at 14:9

@Grant Peters: You can cut it even smaller (down to 16 bytes) with Caspin's answer. – Lentissimo 9/4, 2010 at 17:58

@e.James, didn't see that and his would probably run faster as well – Giverin 11/4, 2010 at 7:51

@Lentissimo combine it with htonl and works for bigger numbers too. – Thyrse 30/8, 2017 at 19:48

Since nobody posted a complete table lookup solution, here is mine:

unsigned char reverse_byte(unsigned char x)
{
    static const unsigned char table[] = {
        0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0,
        0x10, 0x90, 0x50, 0xd0, 0x30, 0xb0, 0x70, 0xf0,
        0x08, 0x88, 0x48, 0xc8, 0x28, 0xa8, 0x68, 0xe8,
        0x18, 0x98, 0x58, 0xd8, 0x38, 0xb8, 0x78, 0xf8,
        0x04, 0x84, 0x44, 0xc4, 0x24, 0xa4, 0x64, 0xe4,
        0x14, 0x94, 0x54, 0xd4, 0x34, 0xb4, 0x74, 0xf4,
        0x0c, 0x8c, 0x4c, 0xcc, 0x2c, 0xac, 0x6c, 0xec,
        0x1c, 0x9c, 0x5c, 0xdc, 0x3c, 0xbc, 0x7c, 0xfc,
        0x02, 0x82, 0x42, 0xc2, 0x22, 0xa2, 0x62, 0xe2,
        0x12, 0x92, 0x52, 0xd2, 0x32, 0xb2, 0x72, 0xf2,
        0x0a, 0x8a, 0x4a, 0xca, 0x2a, 0xaa, 0x6a, 0xea,
        0x1a, 0x9a, 0x5a, 0xda, 0x3a, 0xba, 0x7a, 0xfa,
        0x06, 0x86, 0x46, 0xc6, 0x26, 0xa6, 0x66, 0xe6,
        0x16, 0x96, 0x56, 0xd6, 0x36, 0xb6, 0x76, 0xf6,
        0x0e, 0x8e, 0x4e, 0xce, 0x2e, 0xae, 0x6e, 0xee,
        0x1e, 0x9e, 0x5e, 0xde, 0x3e, 0xbe, 0x7e, 0xfe,
        0x01, 0x81, 0x41, 0xc1, 0x21, 0xa1, 0x61, 0xe1,
        0x11, 0x91, 0x51, 0xd1, 0x31, 0xb1, 0x71, 0xf1,
        0x09, 0x89, 0x49, 0xc9, 0x29, 0xa9, 0x69, 0xe9,
        0x19, 0x99, 0x59, 0xd9, 0x39, 0xb9, 0x79, 0xf9,
        0x05, 0x85, 0x45, 0xc5, 0x25, 0xa5, 0x65, 0xe5,
        0x15, 0x95, 0x55, 0xd5, 0x35, 0xb5, 0x75, 0xf5,
        0x0d, 0x8d, 0x4d, 0xcd, 0x2d, 0xad, 0x6d, 0xed,
        0x1d, 0x9d, 0x5d, 0xdd, 0x3d, 0xbd, 0x7d, 0xfd,
        0x03, 0x83, 0x43, 0xc3, 0x23, 0xa3, 0x63, 0xe3,
        0x13, 0x93, 0x53, 0xd3, 0x33, 0xb3, 0x73, 0xf3,
        0x0b, 0x8b, 0x4b, 0xcb, 0x2b, 0xab, 0x6b, 0xeb,
        0x1b, 0x9b, 0x5b, 0xdb, 0x3b, 0xbb, 0x7b, 0xfb,
        0x07, 0x87, 0x47, 0xc7, 0x27, 0xa7, 0x67, 0xe7,
        0x17, 0x97, 0x57, 0xd7, 0x37, 0xb7, 0x77, 0xf7,
        0x0f, 0x8f, 0x4f, 0xcf, 0x2f, 0xaf, 0x6f, 0xef,
        0x1f, 0x9f, 0x5f, 0xdf, 0x3f, 0xbf, 0x7f, 0xff,
    };
    return table[x];
}

Casia answered 9/4, 2010 at 9:27 Comment(4)

Useful, thanks. Seems that my slower shifting method was limiting performance in an embedded app. Placed table in ROM on a PIC (with addition of rom keyword). – Revelation 23/4, 2012 at 10:2

A simpler method: graphics.stanford.edu/~seander/bithacks.html#BitReverseTable – Handicap 14/6, 2014 at 1:30

How did you get this table? What is the algorithm to generate this table? – Formication 20/4, 2022 at 4:42

@Formication Just count from 0 to 255 and reverse each byte by any known method. For example, isolate each bit, move it into its destination, and then combine the moved bits back together. That's a classic bitwise-and, bit-shift, bitwise-or application. – Casia 24/4, 2022 at 9:29

See the bit twiddling hacks for many solutions. Copypasting from there is obviously simple to implement. =)

For example (on a 32-bit CPU):

uint8_t b = byte_to_reverse;
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

If by “simple to implement” one means something that can be done without a reference in an exam or job interview, then the safest bet is probably the inefficient copying of bits one by one into another variable in reverse order (already shown in other answers).

Joerg answered 8/4, 2010 at 19:37 Comment(3)

From your URL: 32 bit CPU: b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16; – Rikkiriksdag 8/4, 2010 at 20:13

@Joshua: That's my personal favourite as well. The caveat (as stated on the linked page) is that it needs to be assigned or cast into an uint8_t or there will be garbage in the upper bits. – Joerg 8/4, 2010 at 20:20

The x86 instructions tzcnt and bzhi can be used to first compute floor(log_2(x + 2)) and zero the bits above the MSB computed from tzcnt respectively, so a cast isn't too expensive in the one-off case. On AMD Ryzen Family 17h, tzcnt and bzhi are both one cycle. They are supported on both AMD and Intel platforms that have the x86 BMI extensions released in 2013 CPUs. – Byrnes 17/7, 2021 at 1:27

There are many ways to reverse bits depending on what you mean the "simplest way".

Reverse by Rotation

Probably the most logical, consists in rotating the byte while applying a mask on the first bit (n & 1):

unsigned char reverse_bits(unsigned char b)
{
    unsigned char   r = 0;
    unsigned        byte_len = 8;

    while (byte_len--) {
        r = (r << 1) | (b & 1);
        b >>= 1;
    }
    return r;
}

As the length of an unsigner char is 1 byte, which is equal to 8 bits, it means we will scan each bit while (byte_len--)
We first check if b as a bit on the extreme right with (b & 1); if so we set bit 1 on r with | and move it just 1 bit to the left by multiplying r by 2 with (r << 1)
Then we divide our unsigned char b by 2 with b >>=1 to erase the bit located at the extreme right of variable b. As a reminder, b >>= 1; is equivalent to b /= 2;

Reverse in One Line

This solution is attributed to Rich Schroeppel in the Programming Hacks section

unsigned char reverse_bits3(unsigned char b)
{
    return (b * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff;
}

The multiply operation (b * 0x0202020202ULL) creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value.
The AND operation (& 0x010884422010ULL) selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits.
Together the multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set.
The last step (% 0x3ff), which involves modulus division by 2^10 - 1 has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like OR operations.

Divide and Conquer Solution

unsigned char reverse(unsigned char b) {
   b = (b & 0xF0) >> 4 | (b & 0x0F) << 4;
   b = (b & 0xCC) >> 2 | (b & 0x33) << 2;
   b = (b & 0xAA) >> 1 | (b & 0x55) << 1;
   return b;
}

This is the most upvoted answer and despite some explanations, I think that for most people it feels difficult to visualize whats 0xF0, 0xCC, 0xAA, 0x0F, 0x33 and 0x55 truly means.

It does not take advantage of '0b' which is a GCC extension and is included since the C++14 standard, release in December 2014, so a while after this answer dating from April 2010

Integer constants can be written as binary constants, consisting of a sequence of ‘0’ and ‘1’ digits, prefixed by ‘0b’ or ‘0B’. This is particularly useful in environments that operate a lot on the bit level (like microcontrollers).

Please check below code snippets to remember and understand even better this solution where we move half by half:

unsigned char reverse(unsigned char b) {
   b = (b & 0b11110000) >> 4 | (b & 0b00001111) << 4;
   b = (b & 0b11001100) >> 2 | (b & 0b00110011) << 2;
   b = (b & 0b10101010) >> 1 | (b & 0b01010101) << 1;
   return b;
}

NB: The >> 4 is because there are 8 bits in 1 byte, which is an unsigned char so we want to take the other half, and so on.

We could easily apply this solution to 4 bytes with only two additional lines and following the same logic. Since both mask complement each other we can even use ~ in order to switch bits and saving some ink:

uint32_t reverse_integer_bits(uint32_t b) {
   uint32_t mask = 0b11111111111111110000000000000000;
   b = (b & mask) >> 16 | (b & ~mask) << 16;
   mask = 0b11111111000000001111111100000000;
   b = (b & mask) >> 8 | (b & ~mask) << 8;
   mask = 0b11110000111100001111000011110000;
   b = (b & mask) >> 4 | (b & ~mask) << 4;
   mask = 0b11001100110011001100110011001100;
   b = (b & mask) >> 2 | (b & ~mask) << 2;
   mask = 0b10101010101010101010101010101010;
   b = (b & mask) >> 1 | (b & ~mask) << 1;
   return b;
}

[C++ Only] Reverse Any Unsigned (Template)

The above logic can be summarized with a loop that would work on any type of unsigned:

template <class T>
T reverse_bits(T n) {
    short bits = sizeof(n) * 8; 
    T mask = ~T(0); // equivalent to uint32_t mask = 0b11111111111111111111111111111111;
    
    while (bits >>= 1) {
        mask ^= mask << (bits); // will convert mask to 0b00000000000000001111111111111111;
        n = (n & ~mask) >> bits | (n & mask) << bits; // divide and conquer
    }

    return n;
}

C++ 17 only

You may use a table that store the reverse value of each byte with (i * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff, initialized through a lambda (you will need to compile it with g++ -std=c++1z since it only works since C++17), and then return the value in the table will give you the accordingly reversed bit:

#include <cstdint>
#include <array>

uint8_t reverse_bits(uint8_t n) {
        static constexpr array<uint8_t, 256> table{[]() constexpr{
                constexpr size_t SIZE = 256;
                array<uint8_t, SIZE> result{};

                for (size_t i = 0; i < SIZE; ++i)
                    result[i] = (i * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff;
                return result;
        }()};

    return table[n];
}

main.cpp

Try it yourself with inclusion of above function:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

template <class T>
void print_binary(T n)
{   T mask = 1ULL << ((sizeof(n) * 8) - 1);  // will set the most significant bit
    for(; mask != 0; mask >>= 1) putchar('0' | !!(n & mask));
    putchar('\n');
}

int main() {
    uint32_t n = 12;
    print_binary(n);
    n = reverse_bits(n); 
    print_binary(n);
    unsigned char c = 'a';
    print_binary(c);
    c = reverse_bits(c);
    print_binary(c);
    uint16_t s = 12;
    print_binary(s);
    s = reverse_bits(s);
    print_binary(s);
    uint64_t l = 12;
    print_binary(l);
    l = reverse_bits(l);
    print_binary(l);
    return 0;
}

Reverse with asm volatile

Last but not least, if simplest means fewer lines, why not give a try to inline assembly?

You can test below code snippet by adding -masm=intel when compiling:

unsigned char reverse_bits(unsigned char c) {
    __asm__ __volatile__ (R"(
        mov cx, 8       
    daloop:                   
        ror di          
        adc ax, ax      
        dec cx          
        jnz short daloop  
    ;)");
}

Explanations line by line:

        mov cx, 8       ; we will reverse the 8 bits contained in one byte
    daloop:             ; while loop
        shr di          ; Shift Register `di` (containing value of the first argument of callee function) to the Right
        rcl ax          ; Rotate Carry Left: rotate ax left and add the carry from shr di, the carry is equal to 1 if one bit was "lost" from previous operation 
        dec cl          ; Decrement cx
        jnz short daloop; Jump if cx register is Not equal to Zero, else end loop and return value contained in ax register

Spivey answered 8/4, 2020 at 21:2 Comment(3)

Section [C++ Only] Reverse Any Unsigned (Template): Note that if the type template parameter T is signed, mask will represent a negative number and applying << on a negative signed integer yields undefined-behavior. Even if T is unsigned the result of the first shift operation on mask is implementation-defined – Yardarm 25/8, 2020 at 10:2

Your x86-64 inline asm is unsafe; missing clobbers and input/output operands to tell the compiler which registers are inputs and outputs. Written that way, you need __attribute__((naked)), and a ret instruction at the bottom. And BTW, this is x86-64 as for the AMD64 System V calling convention. You just said "assembly". In ARM or AArch64 assembly you'd use rbit. Clang 13 and later does recognize some algorithms (loop or SWAR divide-and-conquer) as rbit idioms: godbolt.org/z/14MM3cExW at least with -O3 for 32-bit. – Halvorson 24/11, 2023 at 0:5

It's strange to use 16-bit operand size for inline asm that looks to be assuming the x86-64 System V calling convention. shr edx, 1 is a slightly more efficient way to shift a bit into CF; ror has weird partial-FLAGS semantics, and adc eax, eax would be better than 16-bit. And dec ecx/jnz is more compact than dec cx/jnz. For performance, you might want to xor eax,eax before the loop to avoid a false dependency, or to be able to return a zero-extended 32-bit value. (An actual unsigned char ret val can have garbage in the high 24 / 56 bits of a reg, that's not a bug.) – Halvorson 24/11, 2023 at 0:16

template <typename T>
T reverse(T n, size_t b = sizeof(T) * CHAR_BIT)
{
    assert(b <= std::numeric_limits<T>::digits);

    T rv = 0;

    for (size_t i = 0; i < b; ++i, n >>= 1) {
        rv = (rv << 1) | (n & 0x01);
    }

    return rv;
}

EDIT:

Converted it to a template with the optional bitcount

Grow answered 8/4, 2010 at 19:38 Comment(6)

@nvl - fixed. I started building it as a template but decided halfway through to not do so... too many &gt &lt – Grow 8/4, 2010 at 19:42

For extra pedenatry, replace sizeof(T)*8 with sizeof(T)*CHAR_BITS. – Silicle 8/4, 2010 at 20:26

@Silicle - Sure, why not... it's not like anybody could actually be too pedantic, now could they? – Grow 9/4, 2010 at 1:1

@Grow For extra extra pendantry, replace sizeof(T)*CHAR_BIT by std::numeric_limits<T>::digits (almost 4 years of pedantry later). – Satyriasis 25/2, 2014 at 22:9

It should be CHAR_BIT, not CHAR_BITS. – Covey 8/11, 2016 at 20:36

it should be rv = (rv << 1) | (n & 0x01); – Sunshade 13/3, 2017 at 23:20

Two lines:

for(i=0;i<8;i++)
     reversed |= ((original>>i) & 0b1)<<(7-i);

or in case you have issues with the "0b1" part:

for(i=0;i<8;i++)
     reversed |= ((original>>i) & 1)<<(7-i);

"original" is the byte you want to reverse. "reversed" is the result, initialized to 0.

Preshrunk answered 28/1, 2013 at 15:40 Comment(0)

Although probably not portable, I would use assembly language.
Many assembly languages have instructions to rotate a bit into the carry flag and to rotate the carry flag into the word (or byte).

The algorithm is:

for each bit in the data type:
  rotate bit into carry flag
  rotate carry flag into destination.
end-for

The high level language code for this is much more complicated, because C and C++ do not support rotating to carry and rotating from carry. The carry flag has to modeled.

Edit: Assembly language for example

;  Enter with value to reverse in R0.
;  Assume 8 bits per byte and byte is the native processor type.
   LODI, R2  8       ; Set up the bit counter
Loop:
   RRC, R0           ; Rotate R0 right into the carry bit.
   RLC, R1           ; Rotate R1 left, then append carry bit.
   DJNZ, R2  Loop    ; Decrement R2 and jump if non-zero to "loop"
   LODR, R0  R1      ; Move result into R0.

Cummins answered 8/4, 2010 at 19:46 Comment(3)

I think this answer is the opposite of simple. Non-portable, assembly, and complex enough to be written in pseudo-code instead of the actual assembly. – Alluvial 8/4, 2010 at 20:59

It is quite simple. I put it into pseudo-code because assembly mnemonics are specific to a breed of processor and there are a lot of breeds out there. If you would like, I can edit this to show the simple assembly language. – Cummins 9/4, 2010 at 16:53

One could see if a compiler optimization simplifies into a suitable assembly instruction. – Dustcloth 7/8, 2019 at 23:30

I find the following solution simpler than the other bit fiddling algorithms I've seen in here.

unsigned char reverse_byte(char a)
{
  return ((a & 0x1)  << 7) | ((a & 0x2)  << 5) |
         ((a & 0x4)  << 3) | ((a & 0x8)  << 1) |
         ((a & 0x10) >> 1) | ((a & 0x20) >> 3) |
         ((a & 0x40) >> 5) | ((a & 0x80) >> 7);
}

It gets every bit in the byte, and shifts it accordingly, starting from the first to the last.

Explanation:

  ((a & 0x1) << 7) //get first bit on the right and shift it into the first left position 
| ((a & 0x2) << 5) //add it to the second bit and shift it into the second left position
      //and so on

Goodoh answered 2/5, 2015 at 12:57 Comment(2)

Beautiful! My favorite so far. – End 22/11, 2016 at 8:15

This is certainly simple, but it should be pointed out that the execution time is O(n) rather than O(log₂ n), where n is the number of bits (8, 16, 32, 64, etc.). – Hildebrandt 10/4, 2020 at 21:31

The simplest way is probably to iterate over the bit positions in a loop:

unsigned char reverse(unsigned char c) {
   int shift;
   unsigned char result = 0;
   for (shift = 0; shift < CHAR_BIT; shift++) {
      if (c & (0x01 << shift))
         result |= (0x80 >> shift);
   }
   return result;
}

Rambort answered 9/4, 2010 at 1:13 Comment(2)

it's CHAR_BIT, without an 's' – Dupont 22/6, 2018 at 12:25

Why use CHAR_BIT when you assume char to have 8 bits? – Damian 4/5, 2019 at 19:48

For the very limited case of constant, 8-bit input, this method costs no memory or CPU at run-time:

#define MSB2LSB(b) (((b)&1?128:0)|((b)&2?64:0)|((b)&4?32:0)|((b)&8?16:0)|((b)&16?8:0)|((b)&32?4:0)|((b)&64?2:0)|((b)&128?1:0))

I used this for ARINC-429 where the bit order (endianness) of the label is opposite the rest of the word. The label is often a constant, and conventionally in octal.

Here's how I used it to define a constant, because the spec defines this label as big-endian 205 octal.

#define LABEL_HF_COMM MSB2LSB(0205)

More examples:

assert(0b00000000 == MSB2LSB(0b00000000));
assert(0b10000000 == MSB2LSB(0b00000001));
assert(0b11000000 == MSB2LSB(0b00000011));
assert(0b11100000 == MSB2LSB(0b00000111));
assert(0b11110000 == MSB2LSB(0b00001111));
assert(0b11111000 == MSB2LSB(0b00011111));
assert(0b11111100 == MSB2LSB(0b00111111));
assert(0b11111110 == MSB2LSB(0b01111111));
assert(0b11111111 == MSB2LSB(0b11111111));
assert(0b10101010 == MSB2LSB(0b01010101));

Voracious answered 18/6, 2014 at 0:43 Comment(0)

You may be interested in std::vector<bool> (that is bit-packed) and std::bitset

It should be the simplest as requested.

#include <iostream>
#include <bitset>
using namespace std;
int main() {
  bitset<8> bs = 5;
  bitset<8> rev;
  for(int ii=0; ii!= bs.size(); ++ii)
    rev[bs.size()-ii-1] = bs[ii];
  cerr << bs << " " << rev << endl;
}

Other options may be faster.

EDIT: I owe you a solution using std::vector<bool>

#include <algorithm>
#include <iterator>
#include <iostream>
#include <vector>
using namespace std;
int main() {
  vector<bool> b{0,0,0,0,0,1,0,1};
  reverse(b.begin(), b.end());
  copy(b.begin(), b.end(), ostream_iterator<int>(cerr));
  cerr << endl;
}

The second example requires c++0x extension (to initialize the array with {...}). The advantage of using a bitset or a std::vector<bool> (or a boost::dynamic_bitset) is that you are not limited to bytes or words but can reverse an arbitrary number of bits.

HTH

Splanchnic answered 8/4, 2010 at 19:37 Comment(5)

How is bitset any simpler than a pod here? Show the code, or it isn't. – Selectman 8/4, 2010 at 19:53

Actu ally, I think that code will reverse the bitset, and then reverse it back to its original. Change ii != size(); to ii < size()/2; and it'll do a better job =) – Posterior 8/4, 2010 at 19:58

(@viktor-sehr no, it will not, rev is different from bs). Anyway I don't like the answer myself: I think this is a case where binary arithmetic and shift operators are better suited. It still remains the simplest to understand. – Splanchnic 8/4, 2010 at 20:7

How about std::vector<bool> b = { ... }; std::vector<bool> rb ( b.rbegin(), b.rend()); - using reverse iterators directly? – Bomber 30/8, 2017 at 21:19

@Bomber I like the immutability of it. – Splanchnic 1/9, 2017 at 18:6

Table lookup or

uint8_t rev_byte(uint8_t x) {
    uint8_t y;
    uint8_t m = 1;
    while (m) {
       y >>= 1;
       if (m&x) {
          y |= 0x80;
       }
       m <<=1;
    }
    return y;
}

edit

Look here for other solutions that might work better for you

Dissatisfy answered 8/4, 2010 at 19:39 Comment(0)

a slower but simpler implementation:

static int swap_bit(unsigned char unit)
{
    /*
     * swap bit[7] and bit[0]
     */
    unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01)) << 7) | (unit & 0x7f));
    unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01))) | (unit & 0xfe));
    unit = (((((unit & 0x80) >> 7) ^ (unit & 0x01)) << 7) | (unit & 0x7f));

    /*
     * swap bit[6] and bit[1]
     */
    unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02)) << 5) | (unit & 0xbf));
    unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02))) | (unit & 0xfd));
    unit = (((((unit & 0x40) >> 5) ^ (unit & 0x02)) << 5) | (unit & 0xbf));

    /*
     * swap bit[5] and bit[2]
     */
    unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04)) << 3) | (unit & 0xdf));
    unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04))) | (unit & 0xfb));
    unit = (((((unit & 0x20) >> 3) ^ (unit & 0x04)) << 3) | (unit & 0xdf));

    /*
     * swap bit[4] and bit[3]
     */
    unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08)) << 1) | (unit & 0xef));
    unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08))) | (unit & 0xf7));
    unit = (((((unit & 0x10) >> 1) ^ (unit & 0x08)) << 1) | (unit & 0xef));

    return unit;
}

Sutton answered 9/4, 2010 at 7:3 Comment(0)

Can this be fast solution?

int byte_to_be_reversed = 
    ((byte_to_be_reversed>>7)&0x01)|((byte_to_be_reversed>>5)&0x02)|      
    ((byte_to_be_reversed>>3)&0x04)|((byte_to_be_reversed>>1)&0x08)| 
    ((byte_to_be_reversed<<7)&0x80)|((byte_to_be_reversed<<5)&0x40)|
    ((byte_to_be_reversed<<3)&0x20)|((byte_to_be_reversed<<1)&0x10);

Gets rid of the hustle of using a for loop! but experts please tell me if this is efficient and faster?

Foothill answered 6/5, 2014 at 17:18 Comment(1)

This has execution time is O(n) rather than O(log₂ n), where n is the number of bits (8, 16, 32, 64, etc.). See elsewhere for answers that execute in O(log₂ n) time. – Hildebrandt 10/4, 2020 at 21:39

Before implementing any algorithmic solution, check the assembly language for whatever CPU architecture you are using. Your architecture may include instructions which handle bitwise manipulations like this (and what could be simpler than a single assembly instruction?).

If such an instruction is not available, then I would suggest going with the lookup table route. You can write a script/program to generate the table for you, and the lookup operations would be faster than any of the bit-reversing algorithms here (at the cost of having to store the lookup table somewhere).

Dentilingual answered 8/4, 2010 at 21:3 Comment(1)

Notably, modern ARM and AArch64 have rbit. (And yes, if supported, it's efficient, not microcoded). x86 doesn't, although you can use SSSE3 pshufb as a lookup table for nibbles, bit-reversing up to 16 bytes in parallel. I forget if RISC-V extension-b (bit-manipulation) has bit-reverse. – Halvorson 28/11, 2021 at 12:0

This simple function uses a mask to test each bit in the input byte and transfer it into a shifting output:

char Reverse_Bits(char input)
{    
    char output = 0;

    for (unsigned char mask = 1; mask > 0; mask <<= 1)
    {
        output <<= 1;

        if (input & mask)
            output |= 1;
    }

    return output;
}

Adlay answered 17/2, 2018 at 5:47 Comment(1)

Mask should be unsigned sorry. – Adlay 17/2, 2018 at 8:45

Assuming that your compiler allows unsigned long long:

unsigned char reverse(unsigned char b) {
  return (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
}

Discovered here

Thomasinathomasine answered 1/11, 2019 at 8:32 Comment(0)

This is a similar method to sth's excellent answer, but with optimizations, support for up to 64-bit integers, and other small improvements.

I utilize a C++ template function reverse_bits() to let the compiler optimize for various word sizes of integers which might be passed to the function. The function should work correctly with any word size that is a multiple of 8 bits, up to a maximum of 64 bits. If your compiler supports words longer than 64 bits, the method is straightforward to extend.

This a complete, ready-to-compile example with the requisite headers. There is a convenient template function to_binary_str() for creating a std::string representation of binary numbers, along with a few calls with various word sizes to demonstrate everything.

If you remove the comments and blank lines, the function is quite compact and visually pleasing.

You can try out it on labstack here.

// this is the only header used by the reverse_bits() function
#include <type_traits>

// these headers are only used by demonstration code
#include <string>
#include <iostream>
#include <cstdint>


template<typename T>
T reverse_bits( T n ) {
    // we force the passed-in type to its unsigned equivalent, because C++ may
    // perform arithmetic right shift instead of logical right shift, depending
    // on the compiler implementation.
    typedef typename std::make_unsigned<T>::type unsigned_T;
    unsigned_T v = (unsigned_T)n;

    // swap every bit with its neighbor
    v = ((v & 0xAAAAAAAAAAAAAAAA) >> 1)  | ((v & 0x5555555555555555) << 1);

    // swap every pair of bits
    v = ((v & 0xCCCCCCCCCCCCCCCC) >> 2)  | ((v & 0x3333333333333333) << 2);

    // swap every nybble
    v = ((v & 0xF0F0F0F0F0F0F0F0) >> 4)  | ((v & 0x0F0F0F0F0F0F0F0F) << 4);
    // bail out if we've covered the word size already
    if( sizeof(T) == 1 ) return v;

    // swap every byte
    v = ((v & 0xFF00FF00FF00FF00) >> 8)  | ((v & 0x00FF00FF00FF00FF) << 8);
    if( sizeof(T) == 2 ) return v;

    // etc...
    v = ((v & 0xFFFF0000FFFF0000) >> 16) | ((v & 0x0000FFFF0000FFFF) << 16);
    if( sizeof(T) <= 4 ) return v;

    v = ((v & 0xFFFFFFFF00000000) >> 32) | ((v & 0x00000000FFFFFFFF) << 32);

    // explictly cast back to the original type just to be pedantic
    return (T)v;
}


template<typename T>
std::string to_binary_str( T n ) {
    const unsigned int bit_count = sizeof(T)*8;
    char s[bit_count+1];
    typedef typename std::make_unsigned<T>::type unsigned_T;
    unsigned_T v = (unsigned_T)n;
    for( int i = bit_count - 1; i >= 0; --i ) {
        if( v & 1 )
            s[i] = '1';
        else
            s[i] = '0';

        v >>= 1;
    }
    s[bit_count] = 0;  // string null terminator
    return s;
}


int main() {
    {
        char x = 0xBA;
        std::cout << to_binary_str( x ) << std::endl;

        char y = reverse_bits( x );
        std::cout << to_binary_str( y ) << std::endl;
    }
    {
        short x = 0xAB94;
        std::cout << to_binary_str( x ) << std::endl;

        short y = reverse_bits( x );
        std::cout << to_binary_str( y ) << std::endl;
    }
    {
        uint64_t x = 0xFEDCBA9876543210;
        std::cout << to_binary_str( x ) << std::endl;

        uint64_t y = reverse_bits( x );
        std::cout << to_binary_str( y ) << std::endl;
    }
    return 0;
}

Demibastion answered 14/9, 2021 at 4:7 Comment(0)

If there is one wondering how to get uint16_t or uint32_t bit reversed by using the solution from Rich Schroeppel in the Programming Hacks section

unsigned char reverse_bits3(unsigned char b)
{
    return (b * 0x0202020202ULL & 0x010884422010ULL) % 0x3ff;
}

It can be done by following divide-and-conquer code on X86/X64 platform (little-endian):

uint16_t reverse_bits3(uint16_t s)
{
    uint8_t &b0 = ((uint8_t*)&s)[0];
    uint8_t &b1 = ((uint8_t*)&s)[1];
    return (((uint16_t)reverse_bits3(b0))<<8) + reverse_bits3(b1);
}

uint32_t reverse_bits3(uint32_t u)
{
    uint16_t &s0 = ((uint16_t*)&u)[0];
    uint16_t &s1 = ((uint16_t*)&u)[1];
    return (((uint16_t)reverse_bits3(s0))<<16) + reverse_bits3(s1);
}

Sakhuja answered 29/1, 2023 at 8:10 Comment(0)

This one is based on the one BobStein-VisiBone provided

#define reverse_1byte(b)    ( ((uint8_t)b & 0b00000001) ? 0b10000000 : 0 ) | \
                            ( ((uint8_t)b & 0b00000010) ? 0b01000000 : 0 ) | \
                            ( ((uint8_t)b & 0b00000100) ? 0b00100000 : 0 ) | \
                            ( ((uint8_t)b & 0b00001000) ? 0b00010000 : 0 ) | \
                            ( ((uint8_t)b & 0b00010000) ? 0b00001000 : 0 ) | \
                            ( ((uint8_t)b & 0b00100000) ? 0b00000100 : 0 ) | \
                            ( ((uint8_t)b & 0b01000000) ? 0b00000010 : 0 ) | \
                            ( ((uint8_t)b & 0b10000000) ? 0b00000001 : 0 )

I really like this one a lot because the compiler automatically handle the work for you, thus require no further resources.

this can also be extended to 16-Bits...

#define reverse_2byte(b)    ( ((uint16_t)b & 0b0000000000000001) ? 0b1000000000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000000000010) ? 0b0100000000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000000000100) ? 0b0010000000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000000001000) ? 0b0001000000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000000010000) ? 0b0000100000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000000100000) ? 0b0000010000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000001000000) ? 0b0000001000000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000010000000) ? 0b0000000100000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000000100000000) ? 0b0000000010000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000001000000000) ? 0b0000000001000000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000010000000000) ? 0b0000000000100000 : 0 ) | \
                            ( ((uint16_t)b & 0b0000100000000000) ? 0b0000000000010000 : 0 ) | \
                            ( ((uint16_t)b & 0b0001000000000000) ? 0b0000000000001000 : 0 ) | \
                            ( ((uint16_t)b & 0b0010000000000000) ? 0b0000000000000100 : 0 ) | \
                            ( ((uint16_t)b & 0b0100000000000000) ? 0b0000000000000010 : 0 ) | \
                            ( ((uint16_t)b & 0b1000000000000000) ? 0b0000000000000001 : 0 )

Spinous answered 28/2, 2017 at 22:31 Comment(1)

I would put the b in parentheses in case it is a more complex expression than a single number, and perhaps also rename the macro to REVERSE_BYTE as a hint that you probably don't want to have a more complex (runtime) expression there. Or make it an inline function. (But overall I like this as being simple enough that you could do it from memory easily with very little chance of error.) – Joerg 8/8, 2018 at 14:55

Here is a simple and readable solution, portable to all conformant platforms, including those with sizeof(char) == sizeof(int):

#include <limits.h>

unsigned char reverse(unsigned char c) {
    int shift;
    unsigned char result = 0;

    for (shift = 0; shift < CHAR_BIT; shift++) {
        result <<= 1;
        result |= c & 1;
        c >>= 1;
    }
    return result;
}

Damian answered 4/5, 2019 at 19:51 Comment(0)

If you using small microcontroller and need high speed solution with small footprint, this could be solutions. It is possible to use it for C project, but you need to add this file as assembler file *.asm, to your C project. Instructions: In C project add this declaration:

extern uint8_t byte_mirror(uint8_t);

Call this function from C

byteOutput= byte_mirror(byteInput);

This is the code, it is only suitable for 8051 core. In the CPU register r0 is data from byteInput. Code rotate right r0 cross carry and then rotate carry left to r1. Repeat this procedure 8 times, for every bit. Then the register r1 is returned to c function as byteOutput. In 8051 core is only posibble to rotate acumulator a.

NAME     BYTE_MIRROR
RSEG     RCODE
PUBLIC   byte_mirror              //8051 core        

byte_mirror
    mov r3,#8;
loop:   
    mov a,r0;
    rrc a;
    mov r0,a;    
    mov a,r1;
    rlc a;   
    mov r1,a;
    djnz r3,loop
    mov r0,a
    ret

PROS: It is small footprint, it is high speed CONS: It is not reusable code, it is only for 8051

011101101->carry

101101110<-carry

Kimsey answered 16/6, 2020 at 10:5 Comment(1)

While this code may answer the question, it would be better to include some context, explaining how it works and when to use it. Code-only answers are not useful in the long run. – Receivable 16/6, 2020 at 10:52

It is simple and fast:

unsigned char reverse(unsigned char rv)
{
unsigned char tmp=0;
if( rv&0x01 ) tmp = 0x80;
if( rv&0x02 ) tmp |= 0x40;
if( rv&0x04 ) tmp |= 0x20;
if( rv&0x08 ) tmp |= 0x10;
if( rv&0x10 ) tmp |= 0x08;
if( rv&0x20 ) tmp |= 0x04;
if( rv&0x40 ) tmp |= 0x02;
if( rv&0x80 ) tmp |= 0x01;
return tmp;
}

Glabrate answered 25/4, 2021 at 8:32 Comment(7)

This is slower and less elegant than the answer that received the most up-votes. – Dorso 25/4, 2021 at 9:51

Really unfortunate. Your information seems to be superficial as you commented on the code I wrote. If you had a little experience with code execution time and how code is assembled, you would know that my code is faster. The fastest code to use is the Look Up Table (LUT). But the problem is the increase in code size. The next step is to use assembly language (shift right/left w/cry) but it depends on the machine and it will be harder and more specific to write. On the other hand using the loop will increase the execution time of the code, but the size of the code in memory will be small. – Glabrate 26/4, 2021 at 11:28

If you want to participate in the discussion, first go and do some research on the assembly and check to see how many CPU clock cycle takes to execute for these 2 programs (You could compile through assembly). I just became a member, it will be interesting to see how much information/experience other members of the StackOverflow site have. – Glabrate 26/4, 2021 at 11:30

Welcome to SO. If you think it was me who down-voted your answer, you're wrong. The lookup table is fast only if it is already in the cache, preferably L1. The solution I mention is easily extensible to entities longer than a byte. The execution time depends on many factors, including the target architecture, the state of instruction pipeline and the cache. In terms of the number of assembly instructions, here we go: godbolt.org/z/sadrhjn9a See also aldeid.com/wiki/X86-assembly/Instructions/rol and benchmark: quick-bench.com/q/Hs10lka2Xj1Y9SyQkgwSkeJ1M44 – Dorso 26/4, 2021 at 12:48

Mr. zikoz In your link when optimization is on my code has twice speed!! link Wouldn't it be better if you tested the code first and then commented? – Glabrate 26/4, 2021 at 13:45

My benchmark has had the optimization turned on. You only removed benchmark::DoNotOptimize, which is essential. Microbenchmarking is not an easy stuff and cannot be done in 5 minutes. Your answer would benefit a lot if you added to it an elaborated benchmark analysis. It is also instructive to look at the assembly generated by the benchmark to see the actual optimization level. Also, pls. move the volatile declaration outside the loop in both cases (I carelessly put one outside and one inside). Also, don't forget: stackoverflow.com/conduct – Dorso 26/4, 2021 at 15:8

@MehrdadRahaaei There is no reason to insult others. BTW, your reasoning is wrong. When you remove benchmark::DoNotOptimize, then the benchmark no longer makes sense. Zkoza's benchmark shows your solution is actually slower. It seems you are the one who does not want to admit this. I even created a benchmark on my machine and your solution took 1.43x longer time than zkoza's. – Fuse 5/5, 2022 at 8:43

With the help of various online resources, i jotted these for myself (not sure if they're 100% accurate) :

#                 octal       hex

# bit-orig    : 01234567    01234567:89ABCDEF
# bit-invert  : 76543210    FEDCBA98:76543210
#
# clz         : 32110000    43221111:00000000
# clo/ffs     : 00001123    00000000:11112234

bit-reverse : [ 0 4 2 6 1 5 3 7 ] [ 0 8 4 C 2 A 6 E 1 9 5 D 3 B 7 F ]

# cto         : 01020103    01020103:01020104
# ctz         : 30102010    40102010:30102010

but this is mostly only convenient if your input is already either hex or octal.

In both formats (8 or 16), you'll notice that after the bit-reflections, all the even number indices are all on the first half. I've also highlighted the same 0-7 on the hex side to help with the visualization of it.

in fact, one doesn't even have to do a double substring. The lookup string can be either used as seeking the letter needed, or simply use it as an index lookup. this is how i reflect the CRC32 polynomial myself :

(z is the input polynomial (or just any hex string)

xn = 0 ^ (x = length(z));          # initialize to numeric 0,
                                   # foo^bar in awk means 
                                   # foo-to-bar-th-power. 
                                   # same as foo**bar in other langs

 y = substr(_REF_bitREV_hex, 2);   # by pre-trimming the lookup str,
                                   # it allows skipping the + 1 at            
                                   # every cycle of the loop
 do { 
     xn *= 16
     xn += index(y, substr(z,x,1)) # keep in mind that this is awk syntax, 
                                   # where strings start at index-1, not zero.
 } while ( 1 < x—- );

One advantage of using a hex- or octal- based approach is that it allows for inputs of any length, enabling arbitrary precision operation without having to use a proper BigInteger or BigFloat library. For that, you'll have to substring out the new digit/letter and do string concats instead of simply adding each time.

Tucana answered 5/10, 2021 at 15:33 Comment(0)

typedef struct
{
    uint8_t b0:1;
    uint8_t b1:1;
    uint8_t b2:1;
    uint8_t b3:1;
    uint8_t b4:1;
    uint8_t b5:1;
    uint8_t b6:1;
    uint8_t b7:1;
} bits_t;

uint8_t reverse_bits(uint8_t src)
{
    uint8_t dst = 0x0;
    bits_t *src_bits = (bits_t *)&src;
    bits_t *dst_bits = (bits_t *)&dst;

    dst_bits->b0 = src_bits->b7;
    dst_bits->b1 = src_bits->b6;
    dst_bits->b2 = src_bits->b5;
    dst_bits->b3 = src_bits->b4;
    dst_bits->b4 = src_bits->b3;
    dst_bits->b5 = src_bits->b2;
    dst_bits->b6 = src_bits->b1;
    dst_bits->b7 = src_bits->b0;

    return dst;
}

Orgeat answered 23/6, 2017 at 19:12 Comment(1)

As a stylistic note, I find the use of uint8_t for the 1-bit fields a bit ugly, since it seems to first say that it will take 8 bits but then at the end of the line defines it as only a single bit. I would use unsigned b0:1 etc. – Joerg 8/8, 2018 at 14:44

I'll chip in my solution, since i can't find anything like this in the answers so far. It is a bit overengineered maybe, but it generates the lookup table using C++14 std::index_sequence in compile time.

#include <array>
#include <utility>

constexpr unsigned long reverse(uint8_t value) {
    uint8_t result = 0;
    for (std::size_t i = 0, j = 7; i < 8; ++i, --j) {
        result |= ((value & (1 << j)) >> j) << i;
    }
    return result;
}

template<size_t... I>
constexpr auto make_lookup_table(std::index_sequence<I...>)
{
    return std::array<uint8_t, sizeof...(I)>{reverse(I)...};   
}

template<typename Indices = std::make_index_sequence<256>>
constexpr auto bit_reverse_lookup_table()
{
    return make_lookup_table(Indices{});
}

constexpr auto lookup = bit_reverse_lookup_table();

int main(int argc)
{
    return lookup[argc];
}

https://godbolt.org/z/cSuWhF

Romain answered 15/2, 2019 at 12:39 Comment(0)

I know that this question is dated but I still think that the topic is relevant for some purposes, and here is a version that works very well and is readable. I can not say that it is the fastest or the most efficient, but it ought to be one of the cleanest. I have also included a helper function for easily displaying the bit patterns. This function uses some of the standard library functions instead of writing your own bit manipulator.

#include <algorithm>
#include <bitset>
#include <exception>
#include <iostream>
#include <limits>
#include <string>

// helper lambda function template
template<typename T>
auto getBits = [](T value) {
    return std::bitset<sizeof(T) * CHAR_BIT>{value};
};

// Function template to flip the bits
// This will work on integral types such as int, unsigned int,
// std::uint8_t, 16_t etc. I did not test this with floating
// point types. I chose to use the `bitset` here to convert
// from T to string as I find it easier to use than some of the
// string to type or type to string conversion functions,
// especially when the bitset has a function to return a string. 
template<typename T>
T reverseBits(T& value) {
    static constexpr std::uint16_t bit_count = sizeof(T) * CHAR_BIT;

    // Do not use the helper function in this function!
    auto bits = std::bitset<bit_count>{value};
    auto str = bits.to_string();
    std::reverse(str.begin(), str.end());
    bits = std::bitset<bit_count>(str);
    return static_cast<T>( bits.to_ullong() );
}

// main program
int main() {
    try {
        std::uint8_t value = 0xE0; // 1110 0000;
        std::cout << +value << '\n'; // don't forget to promote unsigned char
        // Here is where I use the helper function to display the bit pattern
        auto bits = getBits<std::uint8_t>(value);
        std::cout << bits.to_string() << '\n';

        value = reverseBits(value);
        std::cout << +value << '\n'; // + for integer promotion

        // using helper function again...
        bits = getBits<std::uint8_t>(value);
        std::cout << bits.to_string() << '\n';

    } catch(const std::exception& e) {  
        std::cerr << e.what();
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

And it gives the following output.

Tamikotamil answered 5/10, 2019 at 8:56 Comment(0)

This one helped me with 8x8 dot matrix set of arrays.

uint8_t mirror_bits(uint8_t var)
{
    uint8_t temp = 0;
    if ((var & 0x01))temp |= 0x80;
    if ((var & 0x02))temp |= 0x40;
    if ((var & 0x04))temp |= 0x20;
    if ((var & 0x08))temp |= 0x10;

    if ((var & 0x10))temp |= 0x08;
    if ((var & 0x20))temp |= 0x04;
    if ((var & 0x40))temp |= 0x02;
    if ((var & 0x80))temp |= 0x01;

    return temp;
}

Veloz answered 13/1, 2020 at 22:30 Comment(2)

This function doesn't actually work, the reverse of 0b11001111 should be 0b11110011, but fails with this function. The same testing method works for many of the other functions listed here. – Canvasback 22/2, 2020 at 3:3

Yep, thanks I corrected my answer. Thanks for the letting me know about my mistake :) – Veloz 23/2, 2020 at 17:3

regarding the bit-reflected lookup table for all 256 bytes, with just a few loops, you can generate it from scratch on the fly very quickly (the mapping from hex to bytes should be trivial) :

 #  gawk profile,created Tue Jul 26 22:22:18 2022

 #  BEGIN rule(s)
    BEGIN {
 1      print initREF()
    }
 # Functions,listed alphabetically
 1  function initREF(_,__,___,____,_____,______,_______)
    {
 1     ______=(_+=_^=_<_)^++_-(_=\
           __=(+(___=____="."))(_~_))

 1     gsub(___,"&\\&",_)
 1     _____[_<_]
       _____[ +_]

 7     do {
 7         gsub(___,_,__)
 7         ___=___""____
       } while (—______)

 1     gsub("....","=&", __)
 1     _+=_^=_<_;_______=__;

 2     for(______ in _____) { ______*=_*_*_
 4        for(____ in _____) {  ____*=_+_
 8           for(___ in _____) { ___*= +_
16              for(__ in _____) {

16                    gsub("=" (_<______)(_<____) (_~___)__,
                   sprintf("%X", __+___+____+______),_______)
        } } } }

 1      __=_______
 1       _="[^ ]+[ ]"
 1      gsub(".",_,_)

 1      gsub("..","0x&, ",__)
 1      gsub((_)_,  "&\n",__)
 1      sub("[\1-@]+$","",__)
 1            gsub(" ","",__)

 1      return __
    }

0x00,0x80,0x40,0xC0,0x20,0xA0,0x60,0xE0,0x10,0x90,0x50,0xD0,0x30,0xB0,0x70,0xF0,
0x08,0x88,0x48,0xC8,0x28,0xA8,0x68,0xE8,0x18,0x98,0x58,0xD8,0x38,0xB8,0x78,0xF8,
0x04,0x84,0x44,0xC4,0x24,0xA4,0x64,0xE4,0x14,0x94,0x54,0xD4,0x34,0xB4,0x74,0xF4,
0x0C,0x8C,0x4C,0xCC,0x2C,0xAC,0x6C,0xEC,0x1C,0x9C,0x5C,0xDC,0x3C,0xBC,0x7C,0xFC,
0x02,0x82,0x42,0xC2,0x22,0xA2,0x62,0xE2,0x12,0x92,0x52,0xD2,0x32,0xB2,0x72,0xF2,
0x0A,0x8A,0x4A,0xCA,0x2A,0xAA,0x6A,0xEA,0x1A,0x9A,0x5A,0xDA,0x3A,0xBA,0x7A,0xFA,
0x06,0x86,0x46,0xC6,0x26,0xA6,0x66,0xE6,0x16,0x96,0x56,0xD6,0x36,0xB6,0x76,0xF6,
0x0E,0x8E,0x4E,0xCE,0x2E,0xAE,0x6E,0xEE,0x1E,0x9E,0x5E,0xDE,0x3E,0xBE,0x7E,0xFE,
0x01,0x81,0x41,0xC1,0x21,0xA1,0x61,0xE1,0x11,0x91,0x51,0xD1,0x31,0xB1,0x71,0xF1,
0x09,0x89,0x49,0xC9,0x29,0xA9,0x69,0xE9,0x19,0x99,0x59,0xD9,0x39,0xB9,0x79,0xF9,
0x05,0x85,0x45,0xC5,0x25,0xA5,0x65,0xE5,0x15,0x95,0x55,0xD5,0x35,0xB5,0x75,0xF5,
0x0D,0x8D,0x4D,0xCD,0x2D,0xAD,0x6D,0xED,0x1D,0x9D,0x5D,0xDD,0x3D,0xBD,0x7D,0xFD,
0x03,0x83,0x43,0xC3,0x23,0xA3,0x63,0xE3,0x13,0x93,0x53,0xD3,0x33,0xB3,0x73,0xF3,
0x0B,0x8B,0x4B,0xCB,0x2B,0xAB,0x6B,0xEB,0x1B,0x9B,0x5B,0xDB,0x3B,0xBB,0x7B,0xFB,
0x07,0x87,0x47,0xC7,0x27,0xA7,0x67,0xE7,0x17,0x97,0x57,0xD7,0x37,0xB7,0x77,0xF7,
0x0F,0x8F,0x4F,0xCF,0x2F,0xAF,0x6F,0xEF,0x1F,0x9F,0x5F,0xDF,0x3F,0xBF,0x7F,0xFF

Tucana answered 27/7, 2022 at 1:54 Comment(0)

This is the easiest approach to remember to me as a developer:

unsigned char   reverse_bits(unsigned char octet)
    {
        return  (((octet >> 0) & 1) << 7) | \
                (((octet >> 1) & 1) << 6) | \
                (((octet >> 2) & 1) << 5) | \
                (((octet >> 3) & 1) << 4) | \
                (((octet >> 4) & 1) << 3) | \
                (((octet >> 5) & 1) << 2) | \
                (((octet >> 6) & 1) << 1) | \
                (((octet >> 7) & 1) << 0);
    }

Lumpen answered 10/9, 2022 at 21:24 Comment(0)

The easiest and fast way is with while

#define REVERSE_NUMBER(n, res) while (n) \
    { res = (res << 1) + (n % 2); n >>= 1; }

unsigned int res; //res is result
unsigned int n = 0x135F; // n = 1001101011111

REVERSE_NUMBER(n, res);  // res = 1111101011001 (1F59)

or with function

unsigned int reverse_number(unsigned int n) {
    unsigned int res;
    while (n) {
        res = (res << 1) + (n % 2);
        n >>= 1;
    }
    return res;
}

even easier way and effective

// shift << and >> equivalent to multiplication and division by 2
// and bitwise & 1 equivalent mod by 2
unsigned int reverse_number(unsigned int n) {
    unsigned int res = 0;
    size_t i = sizeof(unsigned int) * 8; // 32 bit value
    while (i--) {
        res = (res << 1) + (n & 1); n >>= 1;
    }
    return res;
}

Villein answered 29/5, 2023 at 21:16 Comment(4)

This leaves the leading zeroes alone, while the question seems to ask for reversing all bits of a byte. – Thickhead 29/5, 2023 at 21:29

@Thickhead but you can use counter bits int i = 32; while(i--) { /*...*/ } instead of n. – Villein 29/5, 2023 at 22:17

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Server 30/5, 2023 at 7:24

This question already has 41 other answers. What does this answer add that isn't covered in the others? This looks very similar to the fifth or so most upvoted answer. – Marcie 31/5, 2023 at 3:33

-1

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int i;
    unsigned char rev = 0x70 ; // 0b01110000
    unsigned char tmp = 0;

    for(i=0;i<8;i++)
    {
    tmp |= ( ((rev & (1<<i))?1:0) << (7-i));
    }
    rev = tmp;

    printf("%x", rev);       //0b00001110 binary value of given number
    return 0;
}

Tiertza answered 5/4, 2016 at 12:28 Comment(1)

Please add some explanation. – Shoshone 5/4, 2016 at 12:33

-1

#define BITS_SIZE 8

int
reverseBits ( int a )
{
  int rev = 0;
  int i;

  /* scans each bit of the input number*/
  for ( i = 0; i < BITS_SIZE - 1; i++ )
  {
    /* checks if the bit is 1 */
    if ( a & ( 1 << i ) )
    {
      /* shifts the bit 1, starting from the MSB to LSB
       * to build the reverse number 
      */
      rev |= 1 << ( BITS_SIZE - 1 ) - i;
    }
  }

  return rev;
}

Modulator answered 2/2, 2017 at 5:14 Comment(0)

-1

  xor ax,ax
  xor bx,bx
  mov cx,8
  mov al,original_byte!
cycle:   shr al,1
  jnc not_inc
  inc bl
not_inc: test cx,cx
  jz,end_cycle
  shl bl,1
  loop cycle
end_cycle:

reversed byte will be at bl register

Evite answered 31/5, 2017 at 16:9 Comment(1)

In an other context that may be a fair answer but the question was about C or C++, not asm ... – Telpher 31/5, 2017 at 16:35

-1

I think this is simple enough

uint8_t reverse(uint8_t a)
{
  unsigned w = ((a << 7) & 0x0880) | ((a << 5) & 0x0440) | ((a << 3) & 0x0220) | ((a << 1) & 0x0110);
  return static_cast<uint8_t>(w | (w>>8));
}

uint8_t reverse(uint8_t a)
{
  unsigned w = ((a & 0x11) << 7) | ((a & 0x22) << 5) | ((a & 0x44) << 3) | ((a & 0x88) << 1);
  return static_cast<uint8_t>(w | (w>>8));
}

Makeshift answered 15/10, 2018 at 4:28 Comment(0)

-1

unsigned char c ; // the original
unsigned char u = // the reversed
c>>7&0b00000001 |
c<<7&0b10000000 |
c>>5&0b00000010 |
c<<5&0b01000000 |
c>>3&0b00000100 |
c<<3&0b00100000 |
c>>1&0b00001000 |
c<<1&0b00010000 ;

Explanation: exchanged bits as per the arrows below.
01234567
<------>
#<---->#
##<-->##
###<>###

L answered 11/11, 2018 at 22:11 Comment(0)

-2

#include <stdio.h>
#include <stdlib.h>

#define BIT0 (0x01)
#define BIT1 (0x02)
#define BIT2 (0x04)
#define BIT3 (0x08)
#define BIT4 (0x10)
#define BIT5 (0x20)
#define BIT6 (0x40)
#define BIT7 (0x80)

#define BYTE_TO_BINARY_PATTERN "%c%c%c%c%c%c%c%c\n"

#define BITETOBINARY(byte) \
(byte & BIT7 ? '1' : '0'), \
(byte & BIT6 ? '1' : '0'), \
(byte & BIT5 ? '1' : '0'), \
(byte & BIT4 ? '1' : '0'), \
(byte & BIT3 ? '1' : '0'), \
(byte & BIT2 ? '1' : '0'), \
(byte & BIT1 ? '1' : '0'), \
(byte & BIT0 ? '1' : '0') \

#define BITETOBINARYREVERSE(byte) \
(byte & BIT0 ? '1' : '0'), \
(byte & BIT1 ? '1' : '0'), \
(byte & BIT2 ? '1' : '0'), \
(byte & BIT3 ? '1' : '0'), \
(byte & BIT4 ? '1' : '0'), \
(byte & BIT5 ? '1' : '0'), \
(byte & BIT6 ? '1' : '0'), \
(byte & BIT7 ? '1' : '0') \



int main()
{

    int i,j,c;

    i |= BIT2|BIT7;

    printf("0x%02X\n",i);    

    printf(BYTE_TO_BINARY_PATTERN,BITETOBINARY(i));

    printf("Reverse");

    printf(BYTE_TO_BINARY_PATTERN,BITETOBINARYREVERSE(i));

   return 0;
}

Diminish answered 19/12, 2018 at 9:25 Comment(1)

this just prints the string in reverse and has nothing with reversing the bits in a byte – Attack 14/4, 2020 at 0:41

-3

This is an old question, but nobody seems to have shown the clear easy way (the closest was edW). I used C# to test this, but there's nothing in this example that couldn't be done easily in C.

void PrintBinary(string prompt, int num, int pad = 8)
{
    Debug.WriteLine($"{prompt}: {Convert.ToString(num, 2).PadLeft(pad, '0')}");
}

int ReverseBits(int num)
{
    int result = 0;
    int saveBits = num;
    for (int i = 1; i <= 8; i++)
    {
        // Move the result one bit to the left
        result = result << 1;

        //PrintBinary("saveBits", saveBits);

        // Extract the right-most bit
        var nextBit = saveBits & 1;

        //PrintBinary("nextBit", nextBit, 1);

        // Add our extracted bit to the result
        result = result | nextBit;

        //PrintBinary("result", result);

        // We're done with that bit, rotate it off the right
        saveBits = saveBits >> 1;

        //Debug.WriteLine("");
    }

    return result;
}

void PrintTest(int nextNumber)
{
    var result = ReverseBits(nextNumber);

    Debug.WriteLine("---------------------------------------");
    PrintBinary("Original", nextNumber);
    PrintBinary("Reverse", result);
}

void Main()
{
    // Calculate the reverse for each number between 1 and 255
    for (int x = 250; x < 256; x++)
        PrintTest(x);
}

Malapropos answered 1/2, 2017 at 9:39 Comment(4)

There are a couple of other answers (in C, no less) that implement the simple approach of iterating over the bits. I guess the point here was to advertise the online service to run C# code, but as an answer to this question this fails by being in a different language and including too much irrelevant stuff. (Even your test loop doesn't match the comment: the comment says "between 1 and 255" but your loop goes 250 to 255.) – Joerg 8/8, 2018 at 15:1

hi, thanks for the effort I don't know who's putting down votes here ! we are here to learn not to put down votes !! – Veloz 27/3, 2021 at 15:27

@Veloz You have clearly misunderstood the purpose of this website. It is not a place for you to practice your skills, it's a place to create a repository of high quality answers. OP posting a useless answer just creates noise that everyone has to filter through when searching for the answer to the original question. Those answers are marked by downvotes so they don't get in the way of helpful answers. – Lesleylesli 20/7, 2023 at 13:3

@Lesleylesli Absolutely, after getting 6 months ban several times I got the point. So the questions or the answers should be a very professional ones. Also I posted a very simple questions and after even simplifying it more, got more popular. – Veloz 27/7, 2023 at 11:8

-5

How about this one...

int value = 0xFACE;

value = ((0xFF & value << 8) | (val >> 8);

Allhallowtide answered 25/9, 2013 at 16:11 Comment(1)

This reverses the bytes in a (16-bit) word, but doesn't alter the order of the bits within the byte, so not really a solution. – Embarkation 25/9, 2013 at 16:38

-8

How about just XOR the byte with 0xFF.

unsigned char reverse(unsigned char b) { b ^= 0xFF; return b; }

Plasterwork answered 7/3, 2013 at 22:22 Comment(1)

xor flips bits, not reverses them. – Noellenoellyn 29/1, 2020 at 3:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++