I was thinking about the utility of non-standard __restrict
keyword in C and C++ and how its effect can be emulated by carefully declare (disjoint) value objects .
Restrict is usually explained through indirect memory access that might or might not overlap memory addresses:
int foo(Pointer1 a, Pointer2 b) { // adding non-standard `restrict` keyword might hint that dreaded_function_call is never called
*a = 5;
*b = 6;
if(*a == *b) dreaded_function_call(); // in isolation this function may or may not be called
}
If the compiler can prove that a
and b
do not overlap then the dreaded_function_call()
is never called or referred in the compilation.
This is exactly what I achieve in this example, with GCC at least, dreaded_function_call
doesn't even appear in the generated machine code.
#include<vector>
template<class It> void modify_v(It);
void dreaded_function_call();
template<class It1, class It2>
void foo(It1 a, It2 b) {
*a = 5;
*b = 6;
if(*a == *b) dreaded_function_call();
}
int main() {
std::vector<int> v1(10); modify_v(v1.begin());
std::vector<int> v2(10); modify_v(v2.begin());
foo(v1.begin() + 5, v2.begin() + 5);
}
However, if I slightly change the code to generate the vector themselves from separate function calls, I loose this optimization.
I can observe this since the generated code will branch and still consider the possibly of calling dreaded_function_call
.
std::vector<int> make_v();
...
int main() {
std::vector<int> v1 = make_v();
std::vector<int> v2 = make_v();
foo(v1.begin() + 5, v2.begin() + 5);
}
What is going on here?
make_v
returns by copy, so v1
and v2
should be disjoint just like in the case above.
Yet the compiler doesn't do the same optimization.
Is this just missed opportunity for optimization, or this optimization would be just outright invalid?
Here is the compiled code that illustrates both cases with GCC: https://godbolt.org/z/WKKzs1edc
(Clang gives the same behavior.)
EDIT: As one of the comments points out, the culprit might be the fact that the compiler cannot see how the vectors are allocated since make_v
is hidden from view, even if make_v
returns a copy. (Also copy elision could be interfering here?)
In this case, the following code could be more clear with respect to whether the copy is visible or not,
std::vector<int> make_v();
...
int main() {
std::vector<int> v0 = make_v();
std::vector<int> v1 = v0;
std::vector<int> v2 = v0;
foo(v1.begin() + 5, v2.begin() + 5);
}
Indeed, in this case GCC doesn't still optimize (doesn't effectively restricts pointer) but Clang does. As illustrated here, https://godbolt.org/z/4xhq1975x
So there is a combination of factors here, the last factor being the compiler itself. But in my opinion, it seems that return-by-copy and the analysis of the copy constructor should be enough.
If this is a consequence of return by move or RVO, then I would say that this is a hidden cost of these features. Just my opinion.
make_v
says it returns by copy, what other option does it have. I understand that the compiler cannot track how the vector inside the function is allocated but then it is returned by copy. Seems that only the copy constructor needs to be analyzed. Having said that, it looks like the problem is the copy constructor, because such copy forfeits the optimization as well: godbolt.org/z/qrv3v15Gs – Midgetstd::vector
with another class that has its copy and move constructors deleted, and it will still work. In C++17, that is. – Robins