Dereferencing type-punned pointer will break strict-aliasing rules
Asked Answered
S

7

45

I used the following piece of code to read data from files as part of a larger program.

double data_read(FILE *stream,int code) {
        char data[8];
        switch(code) {
        case 0x08:
            return (unsigned char)fgetc(stream);
        case 0x09:
            return (signed char)fgetc(stream);
        case 0x0b:
            data[1] = fgetc(stream);
            data[0] = fgetc(stream);
            return *(short*)data;
        case 0x0c:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(int*)data;
        case 0x0d:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(float*)data;
        case 0x0e:
            for(int i=7;i>=0;i--)
                data[i] = fgetc(stream);
            return *(double*)data;
        }
        die("data read failed");
        return 1;
    }

Now I am told to use -O2 and I get following gcc warning: warning: dereferencing type-punned pointer will break strict-aliasing rules

Googleing I found two orthogonal answers:

vs

In the end I don't want to ignore the warnings. What would you recommend?

[update] I substituted the toy example with the real function.

Skin answered 14/7, 2010 at 12:48 Comment(5)
Your function is returning a double, but you cast your return to an int. Why not cast to double?Fleam
My reading of the supplied links: bytes.com link appears to be mostly wrong (actually things have changed since GCC 4.x released), while SO link seem to be OK. See C99, "6.5 Expressions", clause 7.Hoem
I'm a bit confused by the error message because I thought aliasing rules excluded char types (i.e. a char pointer is always allowed to alias other pointers unless it's restricted.) Maybe you have to make it unsigned char for that to apply..? I'd be interested in seeing the correct answer.Pillory
@R A char * can alias anything but not the other way around. He is casting and dereferencing char to short,int,float and double in the above code.Mogul
Does this answer your question? "dereferencing type-punned pointer will break strict-aliasing rules" warningChromogen
U
26

It looks a lot as if you really want to use fread:

int data;
fread(&data, sizeof(data), 1, stream);

That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.

Note that both your code and mine are highly non-portable.

Edit: If you want to read ints of various lengths from a file, portably, try something like this:

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)

This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.

If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.

Unessential answered 14/7, 2010 at 13:1 Comment(8)
+1. To stress again: an union is an official way to keep the code strict aliasing compliant. This is not gcc specific, it's just gcc's optimizer is more broken in the respect. The warnings shouldn't be ignored: either disable explicitly -fstrict-aliasing optimization or fix the code.Hoem
@Framester: Depends on what you want to port to. Most desktop systems and kin mean the same thing by a 32-bit int, but some are big-endian and some are small-endian, meaning the order of the bytes in the int can vary.Apocopate
@David: Just to pick a nit: The usual term is "little-endian".Unessential
@Hoem "an union is an official way to keep the code strict aliasing compliant." According to who?Tetrode
@Hoem - yes broken and annoying, if you have hundreds of different type of packets that come from a file descriptor (network for example), you need to have a gigantic union of opaque type just not to have to turn off the strict-aliasing altogether.Furry
C11 standart says "The value of at most one of the members can be stored in a union object at any time." ($6.7.2.1), so this would mean using union this way would be undefined behaviour (it was already pointed to me in another post because I used it myself).Incest
@Incest see §6.2.6.1 ¶7: the bytes ... that do not correspond to that member but do correspond to other members take unspecified values, which implies that bytes can be reinterpreted by reading through a different member. Also, this was the subject of a correction in ISO C99 TC3 (DR283)Courtnay
What is the safe way to do it in C++?Worldshaking
P
41

The problem occurs because you access a char-array through a double*:

char data[8];
...
return *(double*)data;

But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:

If the compiler knows that your *(double*) can in no way overlap with data[], it's allowed to all sorts of things like reordering your code into:

return *(double*)data;
for(int i=7;i>=0;i--)
    data[i] = fgetc(stream);

The loop is most likely optimized away and you end up with just:

return *(double*)data;

Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data, it could have given bugs.

But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:

double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;

Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.

Pelotas answered 12/10, 2012 at 14:49 Comment(0)
U
26

It looks a lot as if you really want to use fread:

int data;
fread(&data, sizeof(data), 1, stream);

That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.

Note that both your code and mine are highly non-portable.

Edit: If you want to read ints of various lengths from a file, portably, try something like this:

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)

This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.

If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.

Unessential answered 14/7, 2010 at 13:1 Comment(8)
+1. To stress again: an union is an official way to keep the code strict aliasing compliant. This is not gcc specific, it's just gcc's optimizer is more broken in the respect. The warnings shouldn't be ignored: either disable explicitly -fstrict-aliasing optimization or fix the code.Hoem
@Framester: Depends on what you want to port to. Most desktop systems and kin mean the same thing by a 32-bit int, but some are big-endian and some are small-endian, meaning the order of the bytes in the int can vary.Apocopate
@David: Just to pick a nit: The usual term is "little-endian".Unessential
@Hoem "an union is an official way to keep the code strict aliasing compliant." According to who?Tetrode
@Hoem - yes broken and annoying, if you have hundreds of different type of packets that come from a file descriptor (network for example), you need to have a gigantic union of opaque type just not to have to turn off the strict-aliasing altogether.Furry
C11 standart says "The value of at most one of the members can be stored in a union object at any time." ($6.7.2.1), so this would mean using union this way would be undefined behaviour (it was already pointed to me in another post because I used it myself).Incest
@Incest see §6.2.6.1 ¶7: the bytes ... that do not correspond to that member but do correspond to other members take unspecified values, which implies that bytes can be reinterpreted by reading through a different member. Also, this was the subject of a correction in ISO C99 TC3 (DR283)Courtnay
What is the safe way to do it in C++?Worldshaking
F
7

Using a union is not the correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).

Filmore answered 22/12, 2010 at 18:31 Comment(2)
"from an unwritten member of the union is undefined" In this simple case: union U { int i; short s; } u; u.s=1; return u.i;, yes. In general, it depends.Tetrode
In C the union is well-defined behaviour; in C++ it is undefined behaviour.Monotheism
A
7

This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html

There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:

return *(short*)data;

With this:

short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
Aureus answered 28/4, 2016 at 16:42 Comment(1)
this is the best answer.Milburt
S
3

Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.

Casting a three byte character array to an int is one of the worst things I have seen, ever. Normally your int has at least 4 bytes. So for the fourth (and maybe more if int is wider) you get random data. And then you cast all of this to a double.

Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.

Scrivenor answered 14/7, 2010 at 14:11 Comment(1)
Hi, I substituted the toy example with the real function. And the int with 3 bytes was just a typo from me.Skin
K
0

The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:

int x;
int foo(double *d)
{
  x++;
  *d=1234;
  return x;
}

a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obvious and the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.

Kurtzig answered 13/4, 2016 at 21:4 Comment(0)
O
-4

Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*. In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.

I think the safe way would be

int i, *p = &i;
char *q = (char*)&p[0];

or

char *q = (char*)(void*)p;

You can also try this and see what you get:

char *q = reinterpret_cast<char*>(p);
Osset answered 16/8, 2010 at 8:25 Comment(2)
reinterpret_cast is C++. This is C.Inwardness
"the standard allows sizeof(char*) to be different from sizeof(int*)" or they could have the same size but different reprsentation, but anyway this has nothing to do with the problem here. This question is about type-punning, not pointer representation. "char *q = (char*)&p[0]" the problem is not how get two pointers of different types to point to the same address. This question is about type-punning, not pointer casts.Tetrode

© 2022 - 2024 — McMap. All rights reserved.