If you want to bitshift a float (or a double) in a way that behaves like an integer, you'll need to operate directly on the parts of the floating-point format. The examples below are in Little Endian (if you're not sure what that is, that's probably what you're using so don't worry about it yet).
Short answer:
This function will shift the float as if it were an int, and give you the value you expected in your example:
void bitshiftFloating(float &inputData, char inputShift){
union FloatingParts{
float input;
struct {
unsigned int mantissa : 23;
unsigned int exponent : 8;
} parts;
} U;
U.input = inputData;
U.parts.exponent += inputShift;
inputData = U.input;
return;
}
Implementation:
float x = 32.32f;
bitshiftFloating(x,-1); //a shift of -1 will divide it in half, giving you 16.16
Long answer:
The evaluation of a float is -1^sign * 2^(exponent-127) * 1.(mantissa)
. This means if you want to "bit shift" it, you only need to adjust the exponent. Easiest way is to throw the variable into a union and single out the exponent in a bit field as its own variable. By looking at the floating-point format, you'll see the fraction/mantissa is 23 bits, and the exponent is 8, followed by a sign bit (which we don't really need for this situation). Write the struct and member bit fields as shown below, in the same order (make sure they're all the same 32-bit type). This will ensure the exponent is properly stored.
(Note: There are other ways to do this, but this is the easiest. You'll have to do your own testing to see if it's really faster than standard division for your use case.)
This is what the union should look like:
union FloatingParts{
float input; //container for whole variable
struct { //struct for bit-field array; float = -1^sign * 2^(exponent-127) * 1.(mantissa)
unsigned int mantissa : 23; //fraction data; extract to a dummy union to get real value, or manually: 1+(bit1*2^-1,...bit23*2^-23)
unsigned int exponent : 8; //biased, so -127 to get real value.
unsigned int sign : 1; //sign bit.
} parts;
}floatUnion;
And then initialize it by setting floatUnion.input
to your desired value. You can then call floatUnion.parts.exponent
, and add or subtract the number you would bit shift by, as it's an exponent of 2:
floatUnion.input = 32.32f;
floatUnion.parts.exponent += -1;
The result here is 16.16f
.
Keep in mind that unions in C++ are technically UB, even though most compilers have no issue with them. It also causes duplication, since the union is its own variable. You'd have to create the float within the union in the first place in order to avoid duplication.
Edit: Thanks to Alexis for pointing out the importance of endianness here.
float
directly? Why do you think anything changes if attempting to operate on the same variable (through a different type)? – Hawsepiecelength >>= 1
? – Hawsepiece