Why are gcc and clang not hoisting strlen out of this loop?
Asked Answered
S

3

11

Consider the following code:

#include <string.h>

void bar(char c);

void foo(const char* restrict ss) {
    for (int i = 0; i < strlen(ss); ++i) {
        bar(*ss);
    }
}    

I would expect the strlen(ss) to be hoisted out of the loop in these essentially ideal conditions; and yet - it isn't, neither by clang 5.0 nor by gcc 7.3 with maximum optimization (-O3).

Why is this the case?

Note: Inspired by (my answer to) this question.

Swordbill answered 28/1, 2018 at 0:31 Comment(11)
General advice: if you want a function hoisted out of a loop do it yourself. C is better at it than most languages in allowing this optimization to work.Inferno
@Joshua: Well, yes, but you could say that about an endless number of optimizations which compiler do apply themselves.Swordbill
Why would you expect it to? Functions can have side effects, and moving the call outside the loop can therefore change the semantics of the program.Playhouse
@AndrewHenle: Are you talking about the potential side-effects of bar()?Swordbill
The compiler is definitely allowed to hoist it. Going by C11 6.7.3.1, "During each execution of [foo], let L be any lvalue that has &L based on [ss]. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: [the type ss points to] shall not be const-qualified... If these requirements are not met, then the behavior is undefined."Exhibitor
You might also want to consider the possibility that no-one's actually ever coded up this optimisation, even if it is valid. There's no requirement that programs be optimised maximally, I suspect an implementation would still be valid if all i variables were stored on a slow hard disk on a server farm in Botswana :-)Derina
Feel free to propose such an optimization to GCC and/or to Clang.Footplate
I consider that question to be opinion based, so I voted to close it. Any "why GCC don't optimize in such a way" question can be answered with "because nobody contributed to GCC to propose that optimization" and why it is so is really a matter of opinion.Footplate
In contrast, a question such as "why does GCC optimize in such way" can be objectively answered with references to standard, and/or by pointing some internal GCC passes doing that optimization.Footplate
@einpoklum: I remember when the C compiler wasn't allowed to assume that standard library functions do what they say they do.Inferno
@BasileStarynkevitch It is not necessarily opinion-based, namely if there is a factual reason in the language or the code forbidding the optimization. One such reason could be the semantics that you understand restrict has, which apparently differ from the OP's. I think the OP is asking in order to see whether they didn't know, overlooked or misunderstood something. An answer of "there is nothing in the language or the code forbidding it" is therefore valuable.Commute
E
6

The other answers claim that the strlen call cannot be hoisted because the string's contents might change between calls. Those answers do not properly account for the semantics of restrict; even if bar had access to the string through a global variable or some other mechanism, the semantics of restrict pointers to const types should(see caveat) prohibit bar from modifying the string.

From C11, N1570 draft, 6.7.3.1:

1 Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.

2 If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment).

3 In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E.137) Note that ''based'' is defined only for expressions with pointer types.

4 During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

5 Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B.

Here, the declaration D is const char* __restrict__ ss, and the associated block B is the body of foo. All lvalues through which strlen accesses the string have &L based on ss(see caveat), and those accesses occur during the execution of B (since, by the definition in section 5, the execution of strlen is part of the execution of B). ss points to a const-qualified type, so by section 4, the compiler is allowed to assume string elements accessed by strlen are not modified during the execution of foo; modifying them would be undefined behavior.

(caveat)The above analysis assumes that strlen accesses the string through "ordinary" pointer dereferencing or indexing. If strlen uses techniques like SSE intrinsics or inline assembly, it is not clear to me whether such accesses technically count as using an lvalue to access the value of the object that it designates. If they do not count as such, the protections of restrict may not apply, and the compiler may not be able to perform the hoisting.

Maybe the above caveat voids the protections of restrict. Maybe the compiler doesn't know enough about the definition of strlen to analyze its interaction with restrict (I'm surprised it wasn't inlined). Maybe the compiler is free to perform the hoisting and just didn't realize it; perhaps some relevant optimization is not implemented, or it failed to propagate the necessary information between the right compiler components. Determining the exact cause would require much more familiarity with the GCC and Clang internals than I possess.

Further-simplified tests that eliminate strlen and the loop show that Clang definitely has some support for restrict-pointer-to-const optimizations, but I wasn't able to observe any such support from GCC.

Exhibitor answered 28/1, 2018 at 4:24 Comment(4)
This would mean that foo is providing a guarantee, or at least allowing the compiler to assert, that bar - even when implemented in another compilation unit - doesn't modify the memory pointed to by ss, or at least those elements accessed by strlen? Even if the bar doesn't honour the guarantee, that's pretty much the same situation as C99's restrict for non-overlapping regions, or C++11's noexcept. But that quoted section is harder to read than an EULA:)Undue
I have tried with different implementations of strlen: int attribute ((noinline)) attribute ((const)) strlen_(char const * restrict a) { int i = 0; while ('\0' != *a) { ++i; ++a; } return i; } When one forbid inline, the loop is optimized out. So, inline optimization can affect negatively other optimizations.Precast
None of the precludes a function that can not modify a const-qualified argument from having side effects. If a function can have side effects, it can not be moved from the loop.Playhouse
@AndrewHenle: The function the questioner wants hoisted is strlen, which doesn't have side effects. (Of course, it's possible that the parts of the compiler that know that and the parts of the compiler that know about restrict couldn't combine their information to perform the hoisting; that would fall under the "failed to propagate the necessary information" hypothesis.)Exhibitor
C
2

Since strlen is being passed a pointer and it is possible that the contents of the memory it is pointing to will change between invocations to strlen so optimizing the call away could introduce bugs. If you can guarantee to the gcc that the function will always return the same value it will optimize it away. From the function attribute documentation:

const

Many functions do not examine any values except their arguments, and have no effects except to return a value. Calls to such functions lend themselves to optimization such as common subexpression elimination. The const attribute imposes greater restrictions on a function’s definition than the similar pure attribute below because it prohibits the function from reading global variables. Consequently, the presence of the attribute on a function declarations allows GCC to emit more efficient code for some calls to the function. Decorating the same function with both the const and the pure attribute is diagnosed.

So taking out the external dependency on strlen, look at the difference in the following two compilations:

int baz (const char* s) __attribute__ ((pure));

void foo(const char* __restrict__ ss)
{
    for (int i = 0; i < baz(ss); ++i)
        bar(*ss);
}

Yields:

foo:
        push    rbp
        push    rbx
        mov     rbp, rdi
        xor     ebx, ebx
        sub     rsp, 8
        jmp     .L2
.L3:
        movsx   edi, BYTE PTR [rbp+0]
        add     ebx, 1
        call    bar
.L2:
        mov     rdi, rbp
        call    baz
        cmp     eax, ebx
        jg      .L3
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

But if we change the pure attribute on baz to const you can see the call hoisted out of the loop:

foo:
        push    r12
        push    rbp
        mov     r12, rdi
        push    rbx
        xor     ebx, ebx
        call    baz
        mov     ebp, eax
        jmp     .L2
.L3:
        movsx   edi, BYTE PTR [r12]
        add     ebx, 1
        call    bar
.L2:
        cmp     ebp, ebx
        jg      .L3
        pop     rbx
        pop     rbp
        pop     r12
        ret

So maybe hunt through your header files and see how strlen is declared.

Convert answered 28/1, 2018 at 2:24 Comment(3)
BTW, I suggest to use gcc -O2 -fverbose-asm -S and to show the emitted .s assembler fileFootplate
@BasileStarynkevitch Do you think that would demonstrate the principle more clearly?Convert
"Since strlen is being passed a pointer and it is possible that the contents of the memory it is pointing to will change between invocations to strlen" <- No, it is not possible, due to the restrict semantics and the fact that foo() does not change it.Swordbill
F
1

ss might be some global variable because you could call foo with some global array like char str[100]; as its argument (e.g. by having foo(str); in your main)...

and bar could modify that global variable (then strlen(ss) could change at every loop).

BTW restrict has perhaps not the meaning you believe. Read carefully section §6.7.3 of the C11 standard and §6.7.3.1. IMHO restrict is in practice mostly useful on two formal arguments of the same function to express the fact that they are not "aliases" or "overlapping" pointers (if you guess what I really mean), and perhaps optimization efforts on restrict have probably been focused on such cases.

Perhaps (but unlikely), on your particular program, a compiler might optimize as you want if you invoke it as gcc -flto -fwhole-program -O3 (on every translation unit, and at program link time). I won't bet on that (but I leave you to check).

Why is this case?

As to why current GCC (or Clang) does not optimize like you want it to, it is because nobody has written such an optimization pass and have it enabled at -O3.

Compilers are not required to do optimizations, just allowed to do some of them (at the choice of their implementors).

Since it is free software, feel free to propose a patch by contributing to GCC (or to Clang). You might need a whole year of work, and you are not sure that your optimization would be accepted (because in practice nobody codes like you show, or because your optimization would be too specific, so unlikely to be triggered, but still slowing down the compiler). But you are welcome to try.

Even if §6.7.3.1 allows your optimization (as the answer by user2357112 demonstrates), it might practically not worth the effort to implement it.

(my intuition is that implementing such an optimization is hard, and won't profit much to existing programs in practice)

BTW, you can definitely experiment such an optimization by coding some GCC plugin doing it (since the plugin framework was designed for such experimentations). You might discover that coding such an optimization is a lot of work and that practically speaking it does not improve the performance of most existing programs (e.g. in a Linux distribution), because few people code that way.

Both GCC and Clang are free software projects, and contributors to them are (from the point of view of e.g. the FSF) volunteers. So feel free to improve GCC (or Clang) like you want it to optimize. By past experience, contributing even a small piece of code to GCC takes a lot of time. And GCC is a huge program (about ten millions lines of code), so understanding its internals is not easy.

Footplate answered 28/1, 2018 at 0:38 Comment(10)
It's __restrict__'ed, so regardless of whether it's a global or what-not, we're guaranteed nothing will change it while it's being used in foo().Swordbill
Unfortunately, restrict has not the (powerful) meaning you want it to have.Footplate
The standard section you quote is not a limitation on what compilers are allowed to assume from the presence of a restrict qualifier. It doesn't support the argument you're trying to make.Exhibitor
Removed it, and I won't copy all of §6.7.3 and §6.7.3.1 into my answer.Footplate
Even after your deletion I think you're misinterpreting what compilers are allowed to do with __restrict__. Of course I might be wrong. The question is, really, what the compiler authors think they're allowed to do.Swordbill
BTW, you could propose a patch doing your optimization, since GCC is free software. I don't think that coding such a patch is easy (it might take you a year of work), and I am not sure it would be accepted easily by the GCC community. But you can try... Good luck on that!Footplate
@BasileStarynkevitch: I doubt this is a bug, it's probably "intentional", for some definition of the word.Swordbill
Yes, the intention is related to my explanationFootplate
Even if my understanding of restrict is not fully correct (and I do admit I never understood all of it; I rarely use it, and when I do it is on two formals of same function), the GCC community is not required to optimize like OP wishes it to.Footplate
__restrict__ is a gcc extension, it doesn't have to correspond to C11 restrictColter

© 2022 - 2024 — McMap. All rights reserved.