Restricted access for allocated arrays belonging to separate object
Asked Answered
M

1

4

I was thinking about the utility of non-standard __restrict keyword in C and C++ and how its effect can be emulated by carefully declare (disjoint) value objects .

Restrict is usually explained through indirect memory access that might or might not overlap memory addresses:

int foo(Pointer1 a, Pointer2 b) {  // adding non-standard `restrict` keyword might hint that dreaded_function_call is never called
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function_call();  // in isolation this function may or may not be called
}

If the compiler can prove that a and b do not overlap then the dreaded_function_call() is never called or referred in the compilation.

This is exactly what I achieve in this example, with GCC at least, dreaded_function_call doesn't even appear in the generated machine code.

#include<vector>

template<class It> void modify_v(It);

void dreaded_function_call();

template<class It1, class It2>
void foo(It1 a, It2 b) {
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function_call();
}

int main() {
    std::vector<int> v1(10); modify_v(v1.begin());
    std::vector<int> v2(10); modify_v(v2.begin());

    foo(v1.begin() + 5, v2.begin() + 5);
}

However, if I slightly change the code to generate the vector themselves from separate function calls, I loose this optimization. I can observe this since the generated code will branch and still consider the possibly of calling dreaded_function_call.

std::vector<int> make_v();
...
int main() {
    std::vector<int> v1 = make_v();
    std::vector<int> v2 = make_v();

    foo(v1.begin() + 5, v2.begin() + 5);
}

What is going on here? make_v returns by copy, so v1 and v2 should be disjoint just like in the case above. Yet the compiler doesn't do the same optimization.

Is this just missed opportunity for optimization, or this optimization would be just outright invalid?

Here is the compiled code that illustrates both cases with GCC: https://godbolt.org/z/WKKzs1edc

(Clang gives the same behavior.)


EDIT: As one of the comments points out, the culprit might be the fact that the compiler cannot see how the vectors are allocated since make_v is hidden from view, even if make_v returns a copy. (Also copy elision could be interfering here?)

In this case, the following code could be more clear with respect to whether the copy is visible or not,

std::vector<int> make_v();
...
int main() {
    std::vector<int> v0 = make_v();

    std::vector<int> v1 = v0;
    std::vector<int> v2 = v0;

    foo(v1.begin() + 5, v2.begin() + 5);
}

Indeed, in this case GCC doesn't still optimize (doesn't effectively restricts pointer) but Clang does. As illustrated here, https://godbolt.org/z/4xhq1975x

So there is a combination of factors here, the last factor being the compiler itself. But in my opinion, it seems that return-by-copy and the analysis of the copy constructor should be enough.

If this is a consequence of return by move or RVO, then I would say that this is a hidden cost of these features. Just my opinion.

Midget answered 20/10 at 20:45 Comment(4)
'make_v returns by copy' -- that depends on the implementation of make_v, which the compiler can't see here. The optimization you're expecting is possible only if the compiler can analyze all constructors of vector and prove that each constructor allocates unique memory block...Robins
@IgorG, the signature of make_v says it returns by copy, what other option does it have. I understand that the compiler cannot track how the vector inside the function is allocated but then it is returned by copy. Seems that only the copy constructor needs to be analyzed. Having said that, it looks like the problem is the copy constructor, because such copy forfeits the optimization as well: godbolt.org/z/qrv3v15GsMidget
Starting with C++17 there's mandatory copy elision. The value can be moved or constructed directly in v1 and v2. Replace std::vector with another class that has its copy and move constructors deleted, and it will still work. In C++17, that is.Robins
@IgorG, yeah, that is my suspicion, if that is the case, it would be a pity if RVO is hindering this optimization. But I don't think this is the case, because even when the compiler can see the copy it refuses to make the optimization.Midget
W
1

Each std::vector<int> holds a pointer that points to the actual memory region holding the elements.

We know from the library specification in the standard, that it is impossible for two std::vector<int> objects to refer to the same element objects or the same storage.

However, from the actual core language point-of-view, there is nothing preventing you from writing a make_v function that always returns std::vector<int> objects that have their internal pointer initialized to the same value. In particular, even if there is no public member function to achieve this, it is always possible to use tricks to access private data members directly. Class invariants aren't actually usable by the optimizer, even if it was clever enough to understand them.

Of course, any way that you could attempt to do this would have undefined behavior under the library rules, but the compilers (with some smaller exceptions) do not know or take into account the library specification and instead work with the standard library code just like any other code.

In your first example the compiler can see through inlining to the actual operator new calls used to allocate memory for the elements. And two calls to operator new must always produce pointers to disjoint memory without intervening operator delete call to the first pointer value. That's knowledge that is built-in to the compiler's optimizer.

No pointer or reference that could be used to access the std::vector<int> objects is given to modify_v and no such pointers escape the function, so the compiler also knows that modify_v can't modify where the vector objects' internal pointers point, nor can modify_v call operator delete on the pointer.

Witenagemot answered 20/10 at 21:6 Comment(10)
Yes, I though about this too, but how about this, godbolt.org/z/bcz3G6YGe , in this case it can see through the actual operator new that makes the copy, and still it doesn't do the optimization.Midget
@Midget That seems to be a missed optimization opportunity somewhere in GCC. Clang does optimize the call away: godbolt.org/z/d5v6xq7xMWitenagemot
@ser17732522, excellent, so the conclusion is that 1) the compiler seems to need to see the copy explicitly, returning by value might not be enough, and 2) GCC still can miss it. As illustrated here: godbolt.org/z/4xhq1975xMidget
@Midget Yes, without a copy (not move) from the return values, there is no reasonable way for a compiler to know that the vectors point to different sequences of elements. And sure, the more complex the example becomes, the less likely it will be that the compiler will be able to find the best optimization result.Witenagemot
@Midget Generally, if optimizations like this are important, it is a good idea to make as much code as possible visible to the compiler so that it can inline as much as possible, i.e. define functions as inline functions in header files. Alternatively, to not loose separate compilation benefits, use LTO.Witenagemot
@Midget Also, in C++23 there is [[assume]] for this purpose. You can use it to give the compiler a proof that the memory isn't overlapping before calling foo.Witenagemot
As much as I hope, I would be surprised if LTO could do this. I wish I am shown differently. Yes, I know that the more code is visible the more optimizations can be made, yet some users of libraries might still do separate compilation. This is for this library of mine, which also deals with arrays and can be subject of aliasing issues: gitlab.com/correaa/boost-multi/-/blob/master/…Midget
How would you use [[assume]] in this case. I tried with [[assume(a != b)]]; and it didn't help. godbolt.org/z/EdG5x43dvMidget
@Midget You are correct, I can't get it to work with [[assume]] or [[unreachable]] in either GCC or Clang. I don't really understand why though. This shouldn't be too hard for the compilers to use during optimization. From a formal perspective: Of course you are not allowed to compare iterators into different containers. However something like &*a != &*b should be valid as test and would guarantee that dreaded_function_call can't be called without the program having UB. A simply if testing this also doesn't prevent the call from optimized away. Seems like a missed optimization.Witenagemot
I had to do this crazy thing to recover the optimization (in clang only), godbolt.org/z/c6x5Eha9h . (returning a const vector from make_v unfortunately didn't help).Midget

© 2022 - 2024 — McMap. All rights reserved.