c++ bit shifting a float

Asked 4/6, 2013 at 0:37 Answered 6/4, 2024 at 18:2

float length =  32.32f;
long  i = *(long*)&length ; // casting a float pointer to a long pointer...
i >>= 1; // right shift i but 1 ... or div by 2
length = *(float*)&i; // is this necessary?

Printing length gives: 0.0

The end result should be: 16.16;

This idea is from http://en.wikipedia.org/wiki/Fast_inverse_square_root. Im trying to understand the section of the code were the long in a float is taken out and bitwise operations are performed on it. Im guessing the point is to improve performance by avoiding branching?

The above code fails. Can anyone tell me what im doing wrong? I was under the impression that it was as simple as getting the long storage in a float and manipulation it, is this wrong?

i found something interesting, if i change the code to:

 float length =  32.32f;
 long  i = *(long*)&length ;
 i  = 0x5f3759df - ( i >> 1 ); 
 length = *(float*)&i;

adding this number (0x5f3759df) to the mix.

Printing out length*100 gives : 17.0538 // approximation of 16.16
trying it with different length gives the same results.

eg: length = 100; result is : 10.3299?? // almost...

Mccreery answered 4/6, 2013 at 0:37 Comment(9)

Why are you not operating on the float directly? Why do you think anything changes if attempting to operate on the same variable (through a different type)? – Hawsepiece 4/6, 2013 at 0:40

well bitwise operations are faster right? the reason doesnt matter, im just trying to understand how to do it. – Mccreery 4/6, 2013 at 0:41

My question's intention was to make you think about it. Why not directly length >>= 1? – Hawsepiece 4/6, 2013 at 0:44

This is undefined behaviour. The compiler is within its rights to generate unexpected output. (FYI, even if it did do the "expected" thing, the result would certainly not be 16.16...) – Hazen 4/6, 2013 at 0:44

From the code snippet you posted, it seems like you are trying to divide the float number, but it won't work like this. Floats use different format to store their bits. – Yordan 4/6, 2013 at 0:47

@LuchianGrigore :) i see what your saying. The compiler wont allow it, i guess for good reason. – Mccreery 4/6, 2013 at 0:47

@OliCharlesworth How is this undefined behavior? – Hypostyle 4/6, 2013 at 0:53

@0x499602D2: Reinterpreting one type as another? I suppose maybe it's implementaion-defined rather than undefined, I don't remember... – Hazen 4/6, 2013 at 0:54

"above code fails. Can anyone tell me what im doing wrong?..." The program has undefined behavior. – Uppity 27/1, 2024 at 3:54

The binary of 32.32f is 01000010000000010100011110101110.

After shift: 00100001000000001010001111010111

You can see how float numbers is stored from Wikipedia.

sign:0

exp:01000010=66

mantissa:00000001010001111010111=2^-7(around)

So the result = 2^(66-127)*(1+2^-7)=2^-59(around)

It's not zero. If you use printf("%.40f\n",x2); then you will see it.

If you feel strange about why that magic numbers works: you can read the wiki page carefully or read this paper.

Killing answered 4/6, 2013 at 0:59 Comment(0)

If you want to bitshift a float (or a double) in a way that behaves like an integer, you'll need to operate directly on the parts of the floating-point format. The examples below are in Little Endian (if you're not sure what that is, that's probably what you're using so don't worry about it yet).

Short answer: This function will shift the float as if it were an int, and give you the value you expected in your example:

void bitshiftFloating(float &inputData, char inputShift){
    union FloatingParts{
        float input;
        struct {
            unsigned int mantissa : 23;
            unsigned int exponent : 8;
        } parts;
    } U;
    U.input = inputData;
    U.parts.exponent += inputShift;
    inputData = U.input;
    return;
}

Implementation:

float x = 32.32f;
bitshiftFloating(x,-1); //a shift of -1 will divide it in half, giving you 16.16

Long answer:

The evaluation of a float is -1^sign * 2^(exponent-127) * 1.(mantissa). This means if you want to "bit shift" it, you only need to adjust the exponent. Easiest way is to throw the variable into a union and single out the exponent in a bit field as its own variable. By looking at the floating-point format, you'll see the fraction/mantissa is 23 bits, and the exponent is 8, followed by a sign bit (which we don't really need for this situation). Write the struct and member bit fields as shown below, in the same order (make sure they're all the same 32-bit type). This will ensure the exponent is properly stored.

(Note: There are other ways to do this, but this is the easiest. You'll have to do your own testing to see if it's really faster than standard division for your use case.)

This is what the union should look like:

union FloatingParts{
            float input;    //container for whole variable
            struct {    //struct for bit-field array; float = -1^sign * 2^(exponent-127) * 1.(mantissa)
                unsigned int mantissa : 23; //fraction data; extract to a dummy union to get real value, or manually: 1+(bit1*2^-1,...bit23*2^-23)
                unsigned int exponent : 8;  //biased, so -127 to get real value.
                unsigned int sign : 1;  //sign bit.
            } parts;
        }floatUnion;

And then initialize it by setting floatUnion.input to your desired value. You can then call floatUnion.parts.exponent, and add or subtract the number you would bit shift by, as it's an exponent of 2:

floatUnion.input = 32.32f;
floatUnion.parts.exponent += -1;

The result here is 16.16f.

Keep in mind that unions in C++ are technically UB, even though most compilers have no issue with them. It also causes duplication, since the union is its own variable. You'd have to create the float within the union in the first place in order to avoid duplication.

Edit: Thanks to Alexis for pointing out the importance of endianness here.

Polygraph answered 27/1, 2024 at 2:25 Comment(4)

Note that one reason for UB is that the bit fields are endian sensitive. I would make a note of that in my answer. – Aport 27/1, 2024 at 2:46

Good point! I always forget about that nonsense. – Polygraph 27/1, 2024 at 2:51

In C++, it is undefined behavior to read a member of a union that is not the one that was written most recently. Your suggested code only works by chance, but not by design, and may break with tomorrow's compiler. – Intuit 27/1, 2024 at 9:1

Considering this is how they work in C, and that functionality is carried over by the compilers, it would only break if C++ added its own implementation of unions, which is extremely unlikely. As long as the programmer is aware that it's UB, I don't think it's an issue to use them in C++. – Polygraph 27/1, 2024 at 15:18

-1

I am a new developer and wanted to challenge myself with this, event though the topic is to do this in C/C++, I came up with this rust code and wanted to share it (one can even adapt this to C since rust is also a system level language).

Here's the code :

/* 
This basically does the same as a bitshift for an integer.
The direction of the shift is determined by the sign of i :
    i negative -> n >> |i| ,
    i positive -> n << i ,
So, what this does is dividing by 2^i if i is negative, 
or multiplying by 2^i if i is positive.
*/ 
fn bitshift_float(f: &mut f64, i: i8) {
    unsafe {
        // Pointer to the float number
        let p: *mut u64 = std::mem::transmute(f);
        // bit manip on the exponent
        *p = (*p & 0x800FFFFFFFFFFFFF) | 
            ((*p as i64 + ((i as i64) << 52)) as u64 & 0x7FF0000000000000);
    }
}

fn main(){
    let mut f: f64 = 0.5;
    let mut e: f64 = 0.5;
    bitshift_float(&mut f, 4);
    println!("{}", f);
    bitshift_float(&mut e, -4);
    println!("{}", e);
}

I tried with other values than 4 and -4, it seems to work well ! Also, "as type" shouldn't be adding more operations to the code, well at least according to what chatGPT told me :

In Rust, the 'as' keyword is used for type casting, which converts a value from one type to another. It's primarily a compile-time operation, meaning it instructs the compiler to treat a value as a different type for the purposes of type checking and code generation.

When you use 'as' to cast a value from one type to another, Rust will attempt to convert the value to the target type if such a conversion is defined and allowed. If the conversion is not possible, you might encounter a compiler error.

I don't know if we can get to optimize this code even further, and it would be exciting if this were faster than using traditional formulas such as "f/2^i" or "f*2^i" !

Java answered 6/4, 2024 at 18:2 Comment(1)

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Whoopee 6/4, 2024 at 20:59

Recommended topics

Hot tags