Compiler optimization of bitwise not operation

Asked 6/9, 2019 at 13:32 Answered 6/9, 2019 at 18:50

I have a simple function testing if two arrays are each others inverse. They are seemingly identical, except for a tmp variable. One works the other doesn't. I can't for the life of me figure out why the compiler would optimize this out - if it indeed is an optimization problem (my compiler is IAR Workbench v4.30.1). Here's my code:

// this works as expected
uint8 verifyInverseBuffer(uint8 *buf, uint8 *bufi, uint32 len)
{
  uint8 tmp;
  for (uint32 i = 0; i < len; i++)
  {
    tmp = ~bufi[i];
    if (buf[i] != tmp)
    {
      return 0;
    }
  }
  return 1;  
}

// this does NOT work as expected (I only removed the tmp!)
uint8 verifyInverseBuffer(uint8 *buf, uint8 *bufi, uint32 len)
{
  for (uint32 i = 0; i < len; i++)
  {
    if (buf[i] != (~bufi[i]))
    {
      return 0;
    }
  }
  return 1;  
}

The first version of the code works, the second does not. Can anyone figure out why? Or come with some tests to probe what is wrong?

Troytroyer answered 6/9, 2019 at 13:32 Comment(12)

What is the input? – Fantasm 6/9, 2019 at 13:35

Did you consider using debug print statements to evaluate the equality yourself? – Lotze 6/9, 2019 at 13:36

@TonyTannous the input is two buffers with verified inverse bytes (4 bytes long). – Troytroyer 6/9, 2019 at 13:39

@Troytroyer I understand, what are the numbers? – Fantasm 6/9, 2019 at 13:39

Godbolts compiler explorer could be useful here. It allows you to see the generated assembly code together with the C or C++ source code, and let you make direct comparisons with different compiler flags (like different optimizer levels). – Suicide 6/9, 2019 at 13:40

@Lotze It's running on an embedded system, so i need to set-up a channel all the way into the function. I suspect that the if statement is somehow skipped. – Troytroyer 6/9, 2019 at 13:41

buf = [0xC0, 0xA8, 0x03, 0x87] bufi = [0x3F, 0x57, 0xFC, 0x78] – Troytroyer 6/9, 2019 at 13:42

In case anyone stumbe upon this, the problem was verified to be implicit type promotion. Solve by either casting to specified type after bitwise operation or store in temp variable of the same type. – Troytroyer 6/9, 2019 at 14:38

@Troytroyer If an answer addressed your question, you should accept it. – Pompous 6/9, 2019 at 16:57

There are other tests that would work too. For example: if (buf[i] != bufi[i] ^ 0xff), or if (buf[i] ^ bufi[i] != 0xff), or if (buf[i] ^ bufi[i] ^ 0xff). – Leontineleontyne 6/9, 2019 at 17:3

The title of this question is poor: you have made an (incorrect) assumption about what is happening, then asked about that assumption - (an X-Y problem). You question should just be "Why does this not work?". You would also do well to explain how it does not work - the result, and the input that generated the result. – Petrifaction 7/9, 2019 at 10:35

Define "works". What happens? Exactly? – Norahnorbert 7/9, 2019 at 20:43

What you see happening is a result of the rules of integer promotions. Anytime a variable smaller than an int is used in an expression the value is promoted to type int.

Suppose bufi[i] contains the value 255. The hex representation of this is 0xFF. This value is then operand of the ~ operator. So the value will first be promoted to int which (assuming it is 32 bit) will have the value 0x000000FF, and applying ~ to this gives you 0xFFFFFF00. You then compare this value with buf[i] which is of type uint8_t. The value 0xFFFFFF00 is outside of this range so the comparison will always be false.

If you assign the result of the ~ back to a variable of type uint8_t, the value 0xFFFFFF00 is converted to 0x00. It is this converted value that is then compared against buf[i].

So the behavior you see is not the result of an optimization but the rules of the language. Using a temp variable as you are is one way to address this issue. You could also cast the result to uint8:

if(buf[i] != (uint8)(~bufi[i]))

Or mask out all but the lowest order byte:

if(buf[i] != (~bufi[i] & 0xff))

Pompous answered 6/9, 2019 at 13:43 Comment(1)

I think buf[i] != (uint8)(~bufi[i]) would also do it, instead of using the temp variable. – Mindamindanao 6/9, 2019 at 13:49

The problem is integer promotion. The ~ operator is very dangerous!

In case of ~bufi[i], the operand of ~ gets promoted according to the integer promotions. Making the code equivalent to ~(int)bufi[i].

So in the second case buf[i] != (~bufi[i]) you get something like 0xXX != 0xFFFFFFFFYY, where "XX" and "YY" are the actual values you wish to compare and 0xFFFF is unintended crap placed there by taking the bitwise complement of an int. This will always evaluate to true so the compiler might optimize away parts of the code, creating a very subtle bug.

In case of tmp = ~bufi[i]; you dodge this bug by truncating 0xFFFFFFFFYY into "YY", the value you are interested in.

See Implicit type promotion rules for details. Also consider adopting MISRA-C to dodge subtle bugs like this.

Romberg answered 6/9, 2019 at 13:44 Comment(5)

MISRA-C unfortunately seems to have been written on the assumption that the Standard would adopt integer promotion rules that differ from what C89 ended up using. The range of expressions for which expr > (u16a-u16b) would be accepted, for example, would only really make sense if that expression was equivalent to expr > (uint16_t)(u16a-u16b), which of course it isn't. Adding a tiny bit more complexity to the type system by having separate "wrapping unsigned algebraic ring" versus "unsigned numeric value" types could have made it possible to write code with genuinely-portable... – Rebarebah 7/9, 2019 at 19:15

...16-bit types (the difference between two 16-bit unsigned-algebraic-ring values would be another value of that same type, regardless of the size of int, while operations on a "16-bit unsigned numeric value" type would behave as though it was promoted to a signed type capable of handling the full range--again regardless of the size of int. Had such a feature been included, MISRA could have required its use, thus ensuring that MISRA-conforming code would behave consistently on all machines. No such thing has yet happened, however. – Rebarebah 7/9, 2019 at 19:18

@Rebarebah The very problem is that expr > (u16a-u16b) is not equivalent to expr > (uint16_t)(u16a-u16b) and it isn't obvious at all, because of C's dangerous and irrational integer promotion rules. MISRA-C only seeks to prevent bugs caused by the broken type promotion system, nothing else. In the best of worlds C would have no implicit promotions at all, instead trusting that programmers are competent enough to consider value ranges when they write code. And since there's no such thing as "undefined integer overflow" on any mainstream CPU, they could fix that part too. – Romberg 8/9, 2019 at 9:42

@Lundin: My point was that the rules of MISRA-C assume those expressions are equivalent, even though their behavior isn't. As for integer promotion rules, C was originally designed on the assumption that all integer operations will behave as though performed with a single "largest" integer type, and floating-point values will likewise behave as though performed with a single "largest" floating-point type. That's a great behavioral model if operations on the largest type are as fast as operations on smaller types, and the largest integer type is usable as a "bag of bits". – Rebarebah 9/9, 2019 at 15:4

@Lundin: That behavioral model is also the safest default one for code which is dealing with values as numbers rather than bags of bits. The problem is that when C added long and was ported to platforms where those original assumptions ceased to apply, it didn't add distinct "number" and "bag of bits" types, but instead used ad hoc rules to make some things act as numbers and others as bags of bits. – Rebarebah 9/9, 2019 at 15:8

As already noted by Lundin and dbush, the comparison in the second version always fails because the opposite of any uint8 value promoted to int is different from all uint8 values. In other words, the second version is equivalent to:

// this does NOT work as expected (I only removed the tmp!)
uint8 verifyInverseBuffer(uint8 *buf, uint8 *bufi, uint32 len) {
    if (len) return 0;
    return 1;
}

As can be seen on Godbolt's compiler explorer, both gcc and clang detect this and optimize the code out completely:

verifyInverseBuffer:
    test    edx, edx
    sete    al
    ret

gcc produces a rather cryptic warning, pointing a suspicious signed/unsigned comparison issue which is not the real problem... Close but no banana.

<source>: In function 'verifyInverseBuffer':
<source>:8:16: warning: comparison of promoted bitwise complement of an unsigned value with unsigned [-Wsign-compare]
    8 |     if (buf[i] != (~bufi[i]))
      |                ^~
Compiler returned: 0

Metaphosphate answered 6/9, 2019 at 18:50 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags