gcc -O0 still optimizes out "unused" code that should raise an FP exception. Is there a compile flag to change that?
Asked Answered
I

3

50

As I brought up in this question, gcc is removing (yes, with -O0) a line of code _mm_div_ss(s1, s2); presumably because the result is not saved. However, this should trigger a floating point exception and raise SIGFPE, which can't happen if the call is removed.

Question: Is there a flag, or multiple flags, to pass to gcc so that code is compiled as-is? I'm thinking something like fno-remove-unused but I'm not seeing anything like that. Ideally this would be a compiler flag instead of having to change my source code, but if that isn't supported is there some gcc attribute/pragma to use instead?

Things I've tried:

$ gcc --help=optimizers | grep -i remove

no results.

$ gcc --help=optimizers | grep -i unused

no results.

And explicitly disabling all dead code/elimination flags -- note that there is no warning about unused code:

$ gcc -O0 -msse2 -Wall -Wextra -pedantic -Winline \
     -fno-dce -fno-dse -fno-tree-dce \
     -fno-tree-dse -fno-tree-fre -fno-compare-elim -fno-gcse  \
     -fno-gcse-after-reload -fno-gcse-las -fno-rerun-cse-after-loop \
     -fno-tree-builtin-call-dce -fno-tree-cselim a.c
a.c: In function ‘main’:
a.c:25:5: warning: ISO C90 forbids mixed declarations and code [-Wpedantic]
     __m128 s1, s2;
     ^
$

Source program

#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <xmmintrin.h>

static void sigaction_sfpe(int signal, siginfo_t *si, void *arg)
{
    printf("%d,%d,%d\n", signal, si!=NULL?1:0, arg!=NULL?1:0);
    printf("inside SIGFPE handler\nexit now.\n");
    exit(1);
}

int main()
{
    struct sigaction sa;

    memset(&sa, 0, sizeof(sa));
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = sigaction_sfpe;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGFPE, &sa, NULL);

    _mm_setcsr(0x00001D80);

    __m128 s1, s2;
    s1 = _mm_set_ps(1.0, 1.0, 1.0, 1.0);
    s2 = _mm_set_ps(0.0, 0.0, 0.0, 0.0);
    _mm_div_ss(s1, s2);

    printf("done (no error).\n");

    return 0;
}

Compiling the above program gives

$ ./a.out
done (no error).

Changing the line

_mm_div_ss(s1, s2);

to

s2 = _mm_div_ss(s1, s2); // add "s2 = "

produces the expected result:

$ ./a.out
inside SIGFPE handler

Edit with more details.

This appears to be related to the __always_inline__ attribute on the _mm_div_ss definition.

$ cat t.c
int
div(int b)
{
    return 1/b;
}

int main()
{
    div(0);
    return 0;
}


$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
$  

(no warnings or errors)

$ ./t.out
Floating point exception
$

vs below (same except for function attributes)

$ cat t.c
__inline int __attribute__((__always_inline__))
div(int b)
{
    return 1/b;
}

int main()
{
    div(0);
    return 0;
}

$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
$   

(no warnings or errors)

$ ./t.out
$

Adding the function attribute __warn_unused_result__ at least gives a helpful message:

$ gcc -O0 -Wall -Wextra -pedantic -Winline t.c -o t.out
t.c: In function ‘main’:
t.c:9:5: warning: ignoring return value of ‘div’, declared with attribute warn_unused_result [-Wunused-result]
     div(0);
     ^

edit:

Some discussion on the gcc mailing list. Ultimately, I think everything is working as intended.

Israelisraeli answered 3/10, 2016 at 13:34 Comment(2)
Try using __attribute__((used)) with the variables involved.Flabellum
Maybe declaring s1 and s2 as volatile helps...Foreshore
F
25

GCC doesn't "optimize out" anything here. It just doesn't generate useless code. It seems to a very common illusion that there's some pure form of code that the compiler should generate and any changes to that are an "optimization". There is no such thing.

The compiler creates some data structure that represents what the code means, then it applies some transformations on that data structure and from that it generates assembler that then gets compiled down to instructions. If you compile without "optimizations" it just means that the compiler will only do the least effort possible to generate code.

In this case, the whole statement is useless because it doesn't do anything and is thrown away immediately (after expanding the inlines and what the builtins mean it is equivalent to writing a/b;, the difference is that writing a/b; will emit a warning about statement with no effect while the builtins probably aren't handled by the same warnings). This is not an optimization, the compiler would actually have to expend extra effort to invent meaning to a meaningless statement, then fake a temporary variable to store the result of this statement to then throw it away.

What you're looking for is not flags to disable optimizations, but pessimization flags. I don't think any compiler developers waste time implementing such flags. Other than maybe as an April fools joke.

Fresh answered 3/10, 2016 at 14:16 Comment(5)
How do I show warnings about statements with no effect? Because -Wall -Wextra -pedantic isn't showing anything.Israelisraeli
@Israelisraeli I couldn't get it to produce a warning either. That's probably a bug in gcc. send them a bug report.Fresh
The misconception here is actually that raising SIGFPE from a division by zero is a defined effect and therefore that optimizing out the call removes observable defined behavior.Annapurna
@Random832: I would expect that there are implementations which do define that as an effect, though the documentation may be sloppy as to what is or isn't guaranteed (saying that a signal might be raised would be meaningless if the implementation might behave an arbitrary other ways as well). It would probably be most logical for an implementation to specify that a divide-by-zero will have no side effects other than the issuance of a signal, and will not yield an observable value without a signal, but need not raise a signal in cases where the quotient is not used in any observable way.Atli
I'm accepting this answer because I think this is actually a bug, or at least, unintended behavior of gcc (removing statement without warning). I'm going to do a bit more research first though; people keep getting caught up on the divide-by-zero which is irrelevant as far as I can tell, but the easiest way to produce a lowlevel side effect.Israelisraeli
I
35

Why does gcc not emit the specified instruction?

A compiler produces code that must have the observable behavior specified by the Standard. Anything that is not observable can be changed (and optimized) at will, as it does not change the behavior of the program (as specified).

How can you beat it into submission?

The trick is to make the compiler believe that the behavior of the particular piece of code is actually observable.

Since this a problem frequently encountered in micro-benchmark, I advise you to look how (for example) Google-Benchmark addresses this. From benchmark_api.h we get:

template <class Tp>
inline void DoNotOptimize(Tp const& value) {
    asm volatile("" : : "g"(value) : "memory");
}

The details of this syntax are boring, for our purpose we only need to know:

  • "g"(value) tells that value is used as input to the statement
  • "memory" is a compile-time read/write barrier

So, we can change the code to:

asm volatile("" : : : "memory");

__m128 result = _mm_div_ss(s1, s2);

asm volatile("" : : "g"(result) : );

Which:

  • forces the compiler to consider that s1 and s2 may have been modified between their initialization and use
  • forces the compiler to consider that the result of the operation is used

There is no need for any flag, and it should work at any level of optimization (I tested it on https://gcc.godbolt.org/ at -O3).

Incredible answered 3/10, 2016 at 15:41 Comment(3)
Do you have any source for what gcc decides is observed behavior or not? I'm trying to narrow down if removing this statement without warning is a bug, unintended, or intentional.Israelisraeli
@BurnsBA: The C++ Standard is the reference for what is or is not observable behavior. It's also kinda hard to read and full of corner cases... In general, for single-threaded programs, observable behavior is anything that affects the output (I/O) of programs. For multi-threaded programs things get more complicated as many interleaving are possible outputs.Incredible
The DoNotOptimize implementation is one of the sneakiest tricks I've seen in a long, long time. Wow.Decision
F
25

GCC doesn't "optimize out" anything here. It just doesn't generate useless code. It seems to a very common illusion that there's some pure form of code that the compiler should generate and any changes to that are an "optimization". There is no such thing.

The compiler creates some data structure that represents what the code means, then it applies some transformations on that data structure and from that it generates assembler that then gets compiled down to instructions. If you compile without "optimizations" it just means that the compiler will only do the least effort possible to generate code.

In this case, the whole statement is useless because it doesn't do anything and is thrown away immediately (after expanding the inlines and what the builtins mean it is equivalent to writing a/b;, the difference is that writing a/b; will emit a warning about statement with no effect while the builtins probably aren't handled by the same warnings). This is not an optimization, the compiler would actually have to expend extra effort to invent meaning to a meaningless statement, then fake a temporary variable to store the result of this statement to then throw it away.

What you're looking for is not flags to disable optimizations, but pessimization flags. I don't think any compiler developers waste time implementing such flags. Other than maybe as an April fools joke.

Fresh answered 3/10, 2016 at 14:16 Comment(5)
How do I show warnings about statements with no effect? Because -Wall -Wextra -pedantic isn't showing anything.Israelisraeli
@Israelisraeli I couldn't get it to produce a warning either. That's probably a bug in gcc. send them a bug report.Fresh
The misconception here is actually that raising SIGFPE from a division by zero is a defined effect and therefore that optimizing out the call removes observable defined behavior.Annapurna
@Random832: I would expect that there are implementations which do define that as an effect, though the documentation may be sloppy as to what is or isn't guaranteed (saying that a signal might be raised would be meaningless if the implementation might behave an arbitrary other ways as well). It would probably be most logical for an implementation to specify that a divide-by-zero will have no side effects other than the issuance of a signal, and will not yield an observable value without a signal, but need not raise a signal in cases where the quotient is not used in any observable way.Atli
I'm accepting this answer because I think this is actually a bug, or at least, unintended behavior of gcc (removing statement without warning). I'm going to do a bit more research first though; people keep getting caught up on the divide-by-zero which is irrelevant as far as I can tell, but the easiest way to produce a lowlevel side effect.Israelisraeli
T
10

I'm not an expert with gcc internals, but it seems that your problem is not with removing dead code by some optimization pass. It is most likely that the compiler is not even considering generate this code in the first place.

Let's reduce your example from compiler specific intrinsics to a plain old addition:

int foo(int num) {
    num + 77;
    return num + 15;
}

No code for + 77 generated:

foo(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     eax, DWORD PTR [rbp-4]
        add     eax, 15
        pop     rbp
        ret

When one of the operands has side effects, only that operand gets evaluated. Still, no addition in the assembly.

But saving this result into an (even unused) variable forces the compiler to generate code for addition:

int foo(int num) {
  int baz = num + 77;
  return num + 15;
}

Assembly:

foo(int):
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-20], edi
    mov     eax, DWORD PTR [rbp-20]
    add     eax, 77
    mov     DWORD PTR [rbp-4], eax
    mov     eax, DWORD PTR [rbp-20]
    add     eax, 15
    pop     rbp
    ret

The following is just a speculation, but from my experience with compiler construction, it is more natural to not generate the code for unused expressions, rather than eliminating this code later.

My recommendation is to be explicit about your intentions, and put the result of an expression into volatile (and, hence, non-removable by the optimizer) variable.

@Matthieu M pointed out that it is not sufficient to prevent precomputing the value. So for something more than playing with signals, you should use documented ways to perform the exact instruction you want (probably, volatile inline assembly).

Turaco answered 3/10, 2016 at 14:17 Comment(4)
Unfortunately, simply putting the result in volatile is insufficient to prevent the compiler from pre-computing it without actually emitting the desired instruction since all parameters are present at compile-time.Incredible
Indeed; shoving the zero into a volatile and reading it out again will shut up the optimizer.Averett
@MatthieuM. I totally agree that it is not always sufficient, but in this case it is (the last link). Also, compiler must prefer faster code to 'emitting the desired instruction' in the common case, to be useful. That is the case where inline assembly should be used.Turaco
@deniss: Unfortunately, from experience, I wary of believing in "sufficient". Compilers being the finicky beasts they are have a tendency to change their behavior at the slightest alteration of their inputs.Incredible

© 2022 - 2024 — McMap. All rights reserved.