float bits and strict aliasing [duplicate]
Asked Answered
F

4

23

I am trying to extract the bits from a float without invoking undefined behavior. Here is my first attempt:

unsigned foo(float x)
{
    unsigned* u = (unsigned*)&x;
    return *u;
}

As I understand it, this is not guaranteed to work due to strict aliasing rules, right? Does it work if a take an intermediate step with a character pointer?

unsigned bar(float x)
{
    char* c = (char*)&x;
    unsigned* u = (unsigned*)c;
    return *u;
}

Or do I have to extract the individual bytes myself?

unsigned baz(float x)
{
    unsigned char* c = (unsigned char*)&x;
    return c[0] | c[1] << 8 | c[2] << 16 | c[3] << 24;
}

Of course this has the disadvantage of depending on endianness, but I could live with that.

The union hack is definitely undefined behavior, right?

unsigned uni(float x)
{
    union { float f; unsigned u; };
    f = x;
    return u;
}

Just for completeness, here is a reference version of foo. Also undefined behavior, right?

unsigned ref(float x)
{
    return (unsigned&)x;
}

So, is it possible to extract the bits from a float (assuming both are 32 bits wide, of course)?


EDIT: And here is the memcpy version as proposed by Goz. Since many compilers do not support static_assert yet, I have replaced static_assert with some template metaprogramming:

template <bool, typename T>
struct requirement;

template <typename T>
struct requirement<true, T>
{
    typedef T type;
};

unsigned bits(float x)
{
    requirement<sizeof(unsigned)==sizeof(float), unsigned>::type u;
    memcpy(&u, &x, sizeof u);
    return u;
}
Flay answered 1/12, 2010 at 19:40 Comment(7)
I don't really see a problem with the very first approach - you don't even have two pointers pointing to the same object. You should be fine, although you may want a compile-time assert that sizeof(float)==sizeof(unsigned). I also don't see a problem with the union hack (although I would again verify the size). But I'm sure there are some obscure rules that I'm not aware of. Let's sit back and wait for people to prove me wrong!Illogical
@Ebomike: The first method falls foul of the strict aliasing rules. Have a read of this: cellperformance.beyond3d.com/articles/2006/06/…Palmitate
Thanks, I knew someone would prove me wrong :)Illogical
@Johannes: How is undefined behavior the safest bet? :) Writing to one union member and then reading from another is undefined.Flay
@FredOverflow well, even if it's UB, I don't think the compiler will go out of its way and sue you for doing it. Anyway, see below for a version that doesn't have the problem. GCC's aggressive optimizations are documented (in its manpage) to allow you to do the union cast. Allowing a necessary evil (it's sometimes not desirable to use library functions or relying on compiler intrinsics to optimize particular uses of memcpy).Joshia
IRRC, the struct hack is defined in C. That may put some incitations on compilers to do it intuitively in C++.Fishy
@Aprogrammer: You mean the union hack, right? The struct hack has to do with arrays of unknown size as the last member of a struct.Flay
P
17

About the only way to truly avoid any issues is to memcpy.

unsigned int FloatToInt( float f )
{
   static_assert( sizeof( float ) == sizeof( unsigned int ), "Sizes must match" );
   unsigned int ret;
   memcpy( &ret, &f, sizeof( float ) );
   return ret;
}

Because you are memcpying a fixed amount the compiler will optimise it out.

That said the union method is VERY widely supported.

Palmitate answered 1/12, 2010 at 19:45 Comment(7)
I would go so far as to say I'd actually file a bug on any compiler that didn't support the union method. Yes, it's technically not part of the standard, but it is so widely used throughout embedded programming that a compiler which doesn't support it isn't very useful.Hintze
@FredOverflow ... typo ;) Fixed.Palmitate
@Crashworks: You'd be fine reporting a bug ... it doesn't mean the compiler writer has to give a monkeys though ;) Their compiler could still be perfectly compliant.Palmitate
Compliant, and not bought by us!Hintze
@Crashworks, hehehe. Personally though, I use the memcpy trick. It is VERY obvious exactly what you are doing to others :)Palmitate
While this might avoid issues, it violates strict aliasing rules. You're casting a float pointer to void * and then into a byte array (depending on how memcpy() interprets it). Is that really better than invoking the undefined but widely supported union workaround?Resplendence
@Goz: According to POSIX (pubs.opengroup.org/onlinepubs/9699919799/functions/memcpy.html) and ISO C standards, it's void *. How the data is interpreted internally is left to the implementation. gcc translates memcpys into loops that transfer one basic machine unit per go, then the remainder using shorter loads/stores, for example.Resplendence
C
6

The union hack is definitely undefined behavior, right?

Yes and no. According to the standard, it is definitely undefined behavior. But it is such a commonly used trick that GCC and MSVC and as far as I know, every other popular compiler, explicitly guarantees that it is safe and will work as expected.

Chunchung answered 1/12, 2010 at 21:34 Comment(5)
Out of interest - which part of it is undefined behavior? (other than you're misinterpreting a float as an integer)Illogical
just that it's not allowed. Only one member of a union is "active" at a time. If you write to a member of a struct, then you are only allowed to read from that same member. The results of reading any other member is undefined.Chunchung
@Illogical "other than" .. that's exactly what is UB. It's an aliasing violation to read from a member that is not aliasing compatible with the active member of the union. The following is fine for example: union A { int a; unsigned char b; }; A x = { 10 }; return x.b;, because you are allowed to access an int by an lvalue of type unsigned char.Joshia
The spec currently has no notion to forbid union A { int a; float b; }; A x = { 0 }; float *b = &x.b; *b = 0.f; return x.b;. The active member in this case is switch to float by writing through the float pointer, but when that write happens in a separate function, this becomes problematic (the compiler basically cannot apply the aliasing rule as it was intended by the Standard). See open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#636Joshia
@JohannesSchaub-litb: It seems the simple common sense answer would be to say that taking the address of a union member should allow the object to be used via the resulting pointer, or pointers derived from it, until the next time code accesses some other union member (via pointer not derived from the aforementioned one), crosses the start of a loop where that occurs, or enters a function where that occurs. Should be simple and practical to implement without hurting many actually-useful optimizations, while handling the common use cases for union-member pointers.Grounds
J
5

The following does not violate the aliasing rule, because it has no use of lvalues accessing different types anywhere

template<typename B, typename A>
B noalias_cast(A a) { 
  union N { 
    A a; 
    B b; 
    N(A a):a(a) { }
  };
  return N(a).b;
}

unsigned bar(float x) {
  return noalias_cast<unsigned>(x);
}
Joshia answered 8/2, 2011 at 16:37 Comment(2)
This proves the standard is broken. It is ridiculous that temporary.member is not a lvalue. I suppose the std guys got confused by the terms "rvalue" (as in value) and "rvalue" (a temporary). lolAdverbial
@Johannes: Is this reasoning still true? Accessing b is accessing a non-active member of a union.Derive
C
0

If you really want to be agnostic about the size of the float type and just return the raw bits, do something like this:

void float_to_bytes(char *buffer, float f) {
    union {
        float x;
        char b[sizeof(float)];
    };

    x = f;
    memcpy(buffer, b, sizeof(float));
}

Then call it like so:

float a = 12345.6789;
char buffer[sizeof(float)];

float_to_bytes(buffer, a);

This technique will, of course, produce output specific to your machine's byte ordering.

Chuckle answered 1/12, 2010 at 20:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.