what stops GCC __restrict__ qualifier from working
Asked Answered
C

1

7

Here is some fairly straightforward code, compiled with -O2 (gcc 4.8.5) :

unsigned char  * linebuf;
int yuyv_tojpegycbcr(unsigned char * buf, int w)
{
    int  col;
    unsigned char * restrict pix = buf;
    unsigned char * restrict line = linebuf;

    for(col = 0; col < w - 1; col +=2)
    {
            line[col*3] = pix[0];
            line[col*3 + 1] = pix[1];
            line[col*3 + 2] = pix[3];
            line[col*3 + 3] = pix[2];
            line[col*3 + 4] = pix[1];
            line[col*3 + 5] = pix[3];
            pix += 4;
    }
    return 0;
}

and here is the corresponding assembly :

0000000000000000 <yuyv_tojpegycbcr>:
   0:   83 fe 01                cmp    $0x1,%esi
   3:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # a <yuyv_tojpegycbcr+0xa>
   a:   7e 4e                   jle    5a <yuyv_tojpegycbcr+0x5a>
   c:   83 ee 02                sub    $0x2,%esi
   f:   31 d2                   xor    %edx,%edx
  11:   d1 ee                   shr    %esi
  13:   48 8d 74 76 03          lea    0x3(%rsi,%rsi,2),%rsi
  18:   48 01 f6                add    %rsi,%rsi
  1b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  20:   0f b6 0f                movzbl (%rdi),%ecx
  23:   48 83 c2 06             add    $0x6,%rdx
  27:   48 83 c7 04             add    $0x4,%rdi
  2b:   48 83 c0 06             add    $0x6,%rax
  2f:   88 48 fa                mov    %cl,-0x6(%rax)
  32:   0f b6 4f fd             movzbl -0x3(%rdi),%ecx
  36:   88 48 fb                mov    %cl,-0x5(%rax)
  39:   0f b6 4f ff             movzbl -0x1(%rdi),%ecx
  3d:   88 48 fc                mov    %cl,-0x4(%rax)
  40:   0f b6 4f fe             movzbl -0x2(%rdi),%ecx
  44:   88 48 fd                mov    %cl,-0x3(%rax)
  47:   0f b6 4f fd             movzbl -0x3(%rdi),%ecx
  4b:   88 48 fe                mov    %cl,-0x2(%rax)
  4e:   0f b6 4f ff             movzbl -0x1(%rdi),%ecx
  52:   88 48 ff                mov    %cl,-0x1(%rax)
  55:   48 39 f2                cmp    %rsi,%rdx
  58:   75 c6                   jne    20 <yuyv_tojpegycbcr+0x20>
  5a:   31 c0                   xor    %eax,%eax
  5c:   c3                      retq   

When compiled without the restrict qualifier, the output is identical : A lots of intermixed loads and store. Some value are loaded twice, and it looks like no optimisation happened. If pix and line are unaliased, I expect the compiler to be smart enough, and among other things load pix[1] and pix[3] only once.

Do you know of anything that could disqualify the restrict qualifier ?

PS : With a newer gcc (4.9.2), on another architecture (arm v7), the result is similar. Here is a test script to compare the generated code with and without restrict.

#!/bin/sh
gcc -c -o test.o -std=c99 -O2 yuyv_to_jpegycbcr.c
objdump -d test.o > test.S


gcc -c -o test2.o -O2 -D restrict='' yuyv_to_jpegycbcr.c
objdump -d test2.o > test2.S
Contexture answered 29/2, 2016 at 12:32 Comment(20)
Any reason you don't use the standard restrict qualifier?Scut
because using std=c99 breaks my code, probably because i did not set the valid feature_test_macros. I can fix this but I don't think it would make a difference.Contexture
You should expect vectorization as well here (assuming the device you are compiling for supports it).Homeopathy
Well, at the very least run the code through the preprocessor and check that they're still there afterwards...Homeopathy
qualifier still present after preprocessing.Contexture
If standard C breaks your code, youi should fix your code, not just put a thick layer of paint on the symptoms.Scut
@Olaf using std=c99 enables ansi mode, and it did not play well with some system includes in another c file. Anyway it is fixed now. Happy ?Contexture
restrict is a hint, not a command. How about disabling strict aliasing altogether (-fno-strict-aliasing)? And what about caching pix[1] and pix[3] yourself within the loop? You could also copy them, thereby removing any alias.Overhear
What do you mena "it enables ansi mode"? ANSI would be C89/C90, which is not standard. The standard headers work fine with standard C. And if you use GNU extensions, use -std=gnu11, gnu99 at least. Also note that C99 also in not standard C (anymore). The only C standard is C11.Scut
@black : It should not change anything, I use two pointer of the same type, so they can be considered alias wether strict aliasing is enforced or not.Contexture
@Olaf : go read the feature_test_macros man page.Contexture
@black: That option should generate even less optimised code, as the compiler now cannot assume pointer don't alias. Also it has a different purpose than restrict.Scut
How do you expect to get useful answers without providing compilable code?Hyracoid
@EOF : Do you wan't a standalone compilable example ? I will do that. I doubt it will add any value. All necessary information are present IMO.Contexture
@Contexture Your question boils down to "Why doesn't my compiler optimize this properly?" How can anybody tell you the reason if you don't give them the code the compiler sees?Hyracoid
@EOF ??? The code is in the question, the preprocessed code is identical. It rules out preprocessor magic. So you can see what the compiler sees. I get your point, so I will reduce it to a standalone compilable test case.Contexture
@Contexture I'm not talking about the preprocessor, I'm talking about struct and function definitions. You know, the thing the compiler will complain about when you try to feed it the function you've posted?Hyracoid
Try marking the function parameters restrict rather than the local variables.Obligatory
@EOF : example modified to be a compilable test case.Contexture
@Obligatory : you're right, do you want to post an answer ?Contexture
O
6

Put the restrict on the function parameters rather than the local variables.

From my experience, most compilers (including GCC) utilize the restrict only if it is specified on the function parameters. All uses on local variables within a function are ignored.

I suspect this has to do with aliasing analysis being done at the function-level rather than the basic-block level. But I have no evidence to back this up. Furthermore, it probably varies by compiler and compiler version.

Either way, these sorts of things are pretty finicky to rely on. So if the performance matters, either you optimize it manually, or you remember to revisit it every time you upgrade or change compilers.

Obligatory answered 1/3, 2016 at 8:55 Comment(4)
Per comment on gcc.gnu.org/bugzilla/show_bug.cgi?id=60712, it seems gcc only applies restrict to function parameterContexture
I am seeing exactly this with block matrix multiplication. This has six loops. If I put the innermost three loops in a static function with restrict parameters then the code is twice as fast as if I don't declare a function. I see the same effect with GCC (6.3) and Clang (4.0). So it appears compilers ignore local variables with restrict exactly as you say. I don't know about ICC.Bryozoan
And even sometimes when I declare a separate static function the compiler gives a worse result. Sometimes I have to declare the inner function with restrict in a separate object file. So you have to do exactly what you say: check the assembly the first time and each upgrade or compiler change.Bryozoan
@Zboson I've given up on restrict semantics. Way too inconsistent. Falls apart after enough layers of inlining. Now I do those optimizations manually.Obligatory

© 2022 - 2024 — McMap. All rights reserved.