Clang's __restrict is inconsistent?
Asked Answered
B

1

7

I was working on highly "vectorizable" code and noted that regarding the C++ __restrict keyword/extension ~, Clang's behavior is different and impractical compared to GCC even in a simple case.

For compiler generated code, the slowdown is about 15x (in my specific case, not the exemple below).

Here is the code (also available at https://godbolt.org/z/sdGd43x75):

struct Param {
    int *x;
};

int foo(int *a, int *b) {
    *a = 5;
    *b = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int foo(Param a, Param b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a.x + *b.x;
}

/////////////////////

struct ParamR {
    // "Restricted pointers assert that members point to disjoint storage"
    // https://en.cppreference.com/w/c/language/restrict, is restrict's 
    // interpretation for C can be used in C++ (for __restrict too ?) ?
    int *__restrict x;
};

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

int rfoo(ParamR *__restrict a, ParamR *__restrict b) {
    *a->x = 5;
    *b->x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a->x + *b->x;
}

This happens for both C++ (__restrict) and C code (using the std restrict).

How can I make Clang understand that the pointer will always point to disjoint storage ?

Broadspectrum answered 13/12, 2021 at 18:29 Comment(6)
Does this answer your question? Why does clang ignore __restrict__?Lilian
It appears the bug still exists or perhaps it's a new variation of the sameLilian
It's somewhat of a duplicate, both are about a bug about clang TBAA (it seems). In my case I use __restrict on a member variable which clang does not notice, in #50365641, clang fails for an even simpler case (__restrict on function argument). If it's a variation of the same bug, it's been 3 years since publicly noticed and it's still not fixed.Broadspectrum
It looks like LLVM just doesn't care, tbaa is designed to help memcpy not handle this. noalias seems to be the way they really implement it. But that doesn't apply to member variables.Lilian
Yet another clang bug/missed opportunity: godbolt.org/z/s8qzr3P3v, even though this recent LLVM dev mtg (youtube.com/watch?v=08XwXB3GHck) is saying that clang should see through this simple cases.Broadspectrum
I find the same difference in GCC and clang: https://mcmap.net/q/1917452/-tricks-to-avoid-pointer-aliasing-in-generic-codePassbook
N
2

It appears to be a bug. Well I don't know if I should call it a bug as it does create correct behavior for the program, let's say it is a missed opportunity in the optimizer.

I have tried a few workarounds and the only thing that worked is to always pass a pointer as a restrict parameter. Like so:

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

// change this:
int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

// to this:
int rfoo2(ParamR a, ParamR b) {
    return rfoo(a.x, b.x);
}

Output from clang 12.0.0:

rfoo(ParamR, ParamR):                       # @rfoo(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, dword ptr [rdi]
        add     eax, 6
        ret
rfoo2(ParamR, ParamR):                      # @rfoo2(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, 11
        ret

Now this is terrible inconvenient, especially for more complex code, but if the performance difference is that great and important and you can't change to gcc it might be something considering doing.

Nagging answered 13/12, 2021 at 19:33 Comment(2)
I would dare call this a bug in the optimizer as it misses a significant opportunity in a simple program.Broadspectrum
For a sound compiler to perform an optimizing transform, it must not only ensure that the transform would likely improve performance, but also that all all possible corner cases where the transform might adversely affect program behavior have been adequately considered and handled. Even if all corner cases are in fact handled, the fact that a compiler writer blocks optimizations in cases which would be difficult to prove sound is hardly a bug.Spatula

© 2022 - 2024 — McMap. All rights reserved.