Is this the most optimal way? C bitfields

Asked 25/1, 2011 at 23:58 Answered 26/1, 2011 at 1:16

I made a function to set or clear a specific number of bits in a DWORD. My function works. I don't need help making it work. However, I am wondering if the method I've chosen to do it is the fastest possible way.

It's rather hard for me to explain how this works. There are two arrays containing DWORDs that are filled with bits on the left and right side of the DWORD (with all binary 1's). It makes a mask with all the bits filled except for the ones I want to set or clear, and then sets them with bitwise operators based on that mask. It seems rather complicated for such a simple task, but it seems like the fastest way I could come up with. It's much faster than setting them bit by bit.

static DWORD __dwFilledBitsRight[] = {
        0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF,    0x7FFF, 0xFFFF, 0x1FFFF, 0x3FFFF, 0x7FFFF, 0xFFFFF, 0x1FFFFF, 0x3FFFFF, 0x7FFFFF, 0xFFFFFF, 0x1FFFFFF, 0x3FFFFFF, 0x7FFFFFF, 0xFFFFFFF, 0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF
    };

static DWORD __dwFilledBitsLeft[] = {
        0x0, 0x80000000, 0xC0000000, 0xE0000000, 0xF0000000, 0xF8000000, 0xFC000000, 0xFE000000, 0xFF000000, 0xFF800000, 0xFFC00000, 0xFFE00000, 0xFFF00000, 0xFFF80000, 0xFFFC0000, 0xFFFE0000,    0xFFFF0000, 0xFFFF8000, 0xFFFFC000, 0xFFFFE000, 0xFFFFF000, 0xFFFFF800, 0xFFFFFC00, 0xFFFFFE00, 0xFFFFFF00, 0xFFFFFF80, 0xFFFFFFC0, 0xFFFFFFE0, 
        0xFFFFFFF0, 0xFFFFFFF8, 0xFFFFFFFC, 0xFFFFFFFE, 0xFFFFFFFF
    };

    // nStartBitFromLeft must be between 1 and 32... 
    // 1 is the bit farthest to the left (actual bit 31)
    // 32 is the bit farthest to the right (actual bit 0)
    inline void __FillDWORDBits(DWORD *p, int nStartBitFromLeft, int nBits, BOOL bSet)
    {
        DWORD dwLeftMask = __dwFilledBitsLeft[nStartBitFromLeft - 1]; // Mask for data on the left of the bits we want
        DWORD dwRightMask = __dwFilledBitsRight[33 - (nStartBitFromLeft + nBits)]; // Mask for data on the right of the bits we want
        DWORD dwBitMask = ~(dwLeftMask | dwRightMask); // Mask for the bits we want
        DWORD dwOriginal = *p;
        if(bSet) *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | (0xFFFFFFFF & dwBitMask);
        else *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | 0;

    }

Yellowtail answered 25/1, 2011 at 23:58 Comment(6)

That's what I was thinking... but my target platform is i386+ – Yellowtail 26/1, 2011 at 0:5

personally when I feel something is hard to explain to somebody else I take that as a warning sign. :-) You probably should benchmark your code and compare with setting bit by bit if you haven't already. Have you tried using a union instead? – Heterogony 26/1, 2011 at 0:6

yeah, I don't trust my benchmarks using the clock ticks in windows... but based on the number of instructions, setting bit by bit had more instructions over the complete iteration. – Yellowtail 26/1, 2011 at 0:14

As a side-note, your __Fi... name gives undefined behavior (names starting with an underscore followed by either another underscore or a capital letter are reserved for the implementation). Personally, I'd separate this into two functions, fill_bits and clear_bits. Passing a bool as a parameter is usually a mistake -- if you insist on this structure, use something like change_bits and pass an enum { CLEAR, FILL};. as the parameter -- but separate functions would be better, IMO. – Vegetate 26/1, 2011 at 0:19

Interesting... I usually use two underscores for functions like these so I can find them faster in my function list... I'm trying to grasp what you're saying here... I don't like the unix style naming conventions (i.e. lower_lower()). Personally I think they make my programs less readable and more cryptic. – Yellowtail 26/1, 2011 at 0:24

looks correct and must run faster on any i386 like processor – Paddy 29/1, 2011 at 12:26

How about:

// Create mask of correct length, and shift to the correct position
DWORD mask = ((1ULL << nBits) - 1) << pos;
// Apply mask (or its inverse)
if (bSet)
{
    *p |= mask;
}
else
{
    *p &= ~mask;
}

It's pretty likely that simple bitwise operations will be faster than table lookup on any modern processor.

Note: Depending on the relationship between DWORD and long long on this platform, you may need special handling for the case where nBits == sizeof(DWORD)*8. Or if nBits==0 is not a possibility, you could just do DWORD mask = ((2ULL << (nBits - 1)) - 1) << pos;.

Update: It's been mentioned that the if could potentially be slow, which is true. Here's a replacement for it, but you'd need to measure to see if it's actually any faster in practice.

// A bit hacky, but the aim is to get 0x00000000 or 0xFFFFFFFF
// (relies on two's-complement representation)
DWORD blanket = bSet - 1;
// Use the blanket to override one or other masking operation
*p |=  (blanket | mask);
*p &= ~(blanket & mask);

Skipbomb answered 26/1, 2011 at 0:6 Comment(17)

You should be careful that 1L << nBits is undefined is nBits is equal to the length of the word, 32 in this case. If 32 (or 64) is a valid parameter, a special case must be written. – Feuillant 26/1, 2011 at 0:8

The if would be slow. It would be nice to avoid it. – Whisper 26/1, 2011 at 0:9

Awesome... that is the code that I was thinking of in my head to begin with, but I just couldn't code it, so I ended up with that lookup table monster... thanks a bunch dude :) – Yellowtail 26/1, 2011 at 0:10

@Giuseppe: Indeed, if sizeof(long) <= sizeof(DWORD). It just so happens I've already updated my answer to mention this! – Skipbomb 26/1, 2011 at 0:10

@PaulB: It's not just whether the assembler is smaller. It's also the memory-access time for performing lookups. – Skipbomb 26/1, 2011 at 0:18

@ruslisk: It's not clear if that if-statement would actually slow things down, especially on modern pipelined CPUs. Those suggested bit manipulation tricks outlined must all be executed in order (since the results of each operation are passed to the next one), whereas in a simple conditional jump both paths may be executed in parallel and one result being discarded. Some compilers, like GNU provide an extension to specify, which path is the un-/likely one to be taken. My gut tells me, that on a modern CPU the if-variant is likely to be the more performant one. – Nervy 26/1, 2011 at 0:43

@datenwolf: At some point, though, a conditional operation must occur, in order to populate the result with one or other intermediate result. Thus a conditional branch must occur. – Skipbomb 26/1, 2011 at 0:46

Yeah, this was the problem I had before when trying this algorithm. The number of bits in "mask" needs to be calculated before shifting. For example, let's say you wanted to fill 4 bits starting at bit 32, it would make a mask that is 4 bits and shift it 32 left, which would cut off most of the mask and put it into the wrong position. I could use a BSF instruction to count the leading bits, but I'm not sure how to implement it yet; i'm working on it now – Yellowtail 26/1, 2011 at 0:55

@old: "4 bits starting at bit 32"? But the highest bit is 31, no? – Skipbomb 26/1, 2011 at 9:39

My mistake, but the same concept applies... if you shift 4 bits left 31 (the bit position), it will cut off 3 bits. – Yellowtail 26/1, 2011 at 12:19

@old: well, yes! It doesn't make sense to set 4 bits starting at bit 31! – Skipbomb 26/1, 2011 at 12:22

I need to make a mask of n bits, and put the leftmost bit at bit b, not the rightmost bit – Yellowtail 26/1, 2011 at 12:23

@old: Well, when setting up mask, just shift by pos-nBits-1 rather than by pos. – Skipbomb 26/1, 2011 at 12:25

I will try that out... thanks... I finally settled on making a FFFFFFFF mask and shifting right, then flipping, but I will try out your algorithm because it seems simpler – Yellowtail 26/1, 2011 at 12:27

Are lookup tables really that slow? I thought that wasn't an issue any more with the newer processors. – Yellowtail 26/1, 2011 at 12:32

@old: It depends on how the access pattern interacts with the cache. It could potentially be very fast, or very slow. – Skipbomb 26/1, 2011 at 12:39

@Oli Checking the assembly isn't just for the size. It's also to validate the assumptions about loads and stores. Optimizers will often produce surprising results – Anesthetize 26/1, 2011 at 15:39

This is the way I'd do it. I'd break it into two functions, setbits() and clearbits(). Steps broken out for clarity, and I'm sure it can be far more optimized.

This version is dependent on 32-bit code as it stands. Also, in my world, bit 0 is the rightmost bit. Your mileage may vary.

setbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift  ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift   ;
  DWORD mask        = left_mask & right_mask     ;

  *p |= mask ;

  return ;
}

clearbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift   ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift    ;
  DWORD mask        = ~( left_mask & right_mask ) ;

  *p &= mask ;

  return ;
}

I stumbled across this improved version whilst looking for something else today. Courtesy of Sean Anderson's Bit Twiddling Hacks at Stanford University:

// uncomment #define to get the super scalar CPU version.
// #define SUPER_SCALAR_CPU
void setbits( unsigned int *p , int offset , int len , int flag )
{
  unsigned int mask = ( ( 1 << len ) - 1 ) << offset ;

#if !defined( SUPER_SCALAR_CPU )
  *p ^= ( - flag ^ *p ) & mask ;
#else
  // supposed to be some 16% faster on a Intel Core 2 Duo than the non-super-scalar version above
  *p = (*p & ~ mask ) | ( - flag & mask ) ;
#endif

  return ;

}

Much depends on your compiler, though.

Cesium answered 26/1, 2011 at 1:16 Comment(2)

yeah, I like this... I refer to the bits in the same order as you, but in the use of this function, it is filling bits in a bit mask by coordinate (x ascends going right) – Yellowtail 26/1, 2011 at 1:19

It's very nice, but depends on context. When bool bSet is a compile time constant, explicit despatching is the best way. – Whisper 26/1, 2011 at 3:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags