Redistribute least significant bits from a 4-byte array to a nibble
Asked Answered
S

3

1

I wish to move bits 0,8,16,24 of a 32-bit value to bits 0,1,2,3 respectively. All other bits in the input and output will be zero.

Obviously I can do that like this:

c = c>>21 + c>>14 + c>>7 + c;
c &= 0xF;

But is there a faster (fewer instructions) way?

Substantive answered 10/1, 2012 at 11:56 Comment(8)
First, you code doesn't do what you ask it to, as there are other bits in c that will included in the addition. Secondly, you are counting the bits the wrong way around. The rightmost (least valued) bit is numbered 0.Embryotomy
Thanks, I've changed the order of the bits.Substantive
And I've clarified the constraints, so I think the code works now :)Substantive
@Djikstra, that code still doesn't do what you want it to do. You need to mask off the bits you are combining together.Homogony
@MSN, I'm sorry I don't understand where it goes wrong. Note that 28 of the bits in the input are guaranteed to be zero.Substantive
@Djikstra, given an int 0xffffffff, you will get the wrong result.Homogony
@MSN, that's ok, I assert that won't happen :)Substantive
duplicate problems, but working on 8 bytes at once instead of 4: How to create a byte out of 8 bool values (and vice versa)?, Intel x86 assembly optimization techniques in a sample problem, What's the fastest way to pack 32 0/1 values into the bits of a single 32-bit variable?. Your code won't work if there's a carry from the higher digit. For example 0x81818181 will produce incorrect input. You need to use a bitwise or instead of addAmundsen
J
2
c = (((c&BITS_0_8_16_24) * BITS_0_7_14_21) >> 21) & 0xF;

Or wait for Intel Haswell processor, doing all this in exactly one instruction (pext).

Update

Taking into account clarified constraints and assuming 32-bit unsigned values, the code may be simplified to this:

c = (c * BITS_7_14_21_28) >> 28;
Johnsten answered 10/1, 2012 at 12:26 Comment(1)
the magic number would be 0b00010000001000000100000010000000 or 0x10204080 => (c * 0x10204080) >> 28. I used this technique hereAmundsen
R
1

If you don't care about portability, and can use SSE instructions, look at the PMOVMSKB instruction and its compiler intrinsic. [I noticed that your bit positions are most significant (sign) bits of the 4 bytes comprising the 32-bit word.]

Rivers answered 10/1, 2012 at 12:27 Comment(1)
if you don't care about portability and you have new enough x86 CPU then you can also use PEXT in BMI2Amundsen
R
0

Instead of writing some obfuscated one-line goo, the below code is what I would write, for maximum portability and maintainability. I would let the optimizer worry about whether or not it is the most effective code.

#include <stdint.h>
#include <limits.h>
#include <stdio.h>

#define BITS_TO_MOVE  4

static const uint32_t OLD_MASK [BITS_TO_MOVE] =
{
  0x0008u,
  0x0080u,
  0x0800u,
  0x8000u
};

static const uint32_t NEW_MASK [BITS_TO_MOVE] =
{
  0x1000u,
  0x2000u,
  0x4000u,
  0x8000u
};


int main()
{
  uint32_t  c     = 0xAAAAu;
  uint32_t  new_c = 0;
  uint8_t   i;

  printf("%.4X\n", c);


  for(i=0; i<BITS_TO_MOVE; i++)
  {
    if ( (c & OLD_MASK[i]) > 0 )
    {
      new_c |= NEW_MASK[i];
    }
  }


  printf("%.4X\n", new_c);
  getchar();

  return 0;
}
Rinker answered 10/1, 2012 at 12:47 Comment(3)
Optimizers are smart, but not smart enough to replace bit extraction code with a single instruction. "Portability" is a moot point: you don't have to bother yourself with it unless you KNOW upfront that the code has to run on multiple CPU platforms.Rivers
@Rivers You never reuse old code you have written in other projects? Also, the same can be said about performance, you don't have to bother with it unless you know that it is needed. I reckon the above code will be "fast enough", perhaps not a single instruction but not worse than maybe 3-4 either. Depending on CPU type of course.Rinker
Reuse? It depends. The OP specifically asked about a faster way than his example, and you gave him something longer and probably slower.Rivers

© 2022 - 2024 — McMap. All rights reserved.