This is a follow up to this question, Restricted access for allocated arrays belonging to separate object, after discovering that returning by value across compilation units doesn't help to hint the compiler about disjoint memory allocations.
I have two functions make_v1
and make_v2
that return an array-like object, by value.
(This could be std::vector
for example, the important point is that the copy constructor clearly allocates memory).
#include<vector>
template<class It> void modify_v(It);
void dreaded_function();
template<class It1, class It2>
void foo(It1 a, It2 b) {
*a = 5;
*b = 6;
if(*a == *b) dreaded_function(); // this is never called if a and b do not overlap
}
int main() {
std::vector<int> v1 = make_v1();
std::vector<int> v2 = make_v2();
foo(v1.begin(), v2.begin());
}
As you can see in the compiled code, there is no assumption made optimizing the call to dreaded_function
.
This is just an example of pointer aliasing, just to diagnose the situation.
There are well known cases of pointer aliasing creating worst performance problems.
https://godbolt.org/z/345PKrsnx
What high-level hints I can give the compiler, in C++ (or perhaps using extensions) that the iterator ranges are at memory regions that cannot overlap?
If this were C, or if I were using pointers and C++ extensions I could use __restrict
and a pointer interface, but I want something more high-level.
(Thanks @TedLygmo for pointing out that __restrict somehow works for std::vector iterators).
The only option that I found was to be very explicit about the copy:
...
std::vector<int> v1 = static_cast<std::vector<int> const&>(make_v1());
std::vector<int> v2 = static_cast<std::vector<int> const&>(make_v2());
This at least convinces clang that it can make the optimization. (and at the cost of an extra copy I am pretty sure.)
https://godbolt.org/z/nYfKeTKqj
But this is ugly and it is also making a copy, also if make_v
becomes an included function later, the solution will have a cost.
(Also it still doesn't convince GCC!)
Is this the only way I can well the compiler than v1
and v2
memory do not overlap? Is there a more streamlined solution?
Should [[assume]] work for this eventually?
At this point, the best I am hoping is some keyword/extension that "simulates" that the result of make_v1
/make_v2
is copied into v1
/v2
.
__restrict
? – Dubbingfoo
likefoo_restricted
that takes restricted pointer and call itfoo_restricted_contiguous(&*v1.begin(), &*v2.begin())
? – Fauch__restrict__
actually works: godbolt.org/z/dr79r967z - Now I wantrestrict
as a keyword in C++ too :-) – Frag__restrict
can only be applied to a pointer argument. Anyway I was hopping for something more generic than this, since now I have a version of templatefoo
that might not work as expected for other arguments (I want to control de optimization from the user code, as whenmake_v
are visible functions). – Fauchrestrict
equivalent. GCC and Clang have__restrict__
, and Visual C++ has__declspec(restrict)
. Note thatrestrict
does not act as an assurance or assertion that the pointers are non-aliasing, the onus is on the programmer to ensure the pointers are non-aliasing. If the pointers are aliasing, then undefined behavior — hijinks & hilarity ensues. C++23[[assume(p != q)]]
might allow for the same kind of optimization. – Carbamatev1
andv2
objects. I thinkrestrict
is a very dangerous feature and a partial fix at best, I don't even understand how it works for iterators. For can I make it for work forv1
andv2
themselves? – Fauchmake_v
are visible. – Fauchfoo_detail(&*a, &*b);
where that function uses__restrict__
. – Fraga
andb
can't overlap. You can't compare them unless they're iterators to the same collection or pointers to members of the same object. – Lunchmake_v1
/make_v2
return by value. Practical answer: in the same way that whenmake_v1
/make_v2
are visible to the compiler godbolt.org/z/8he5cc7Ed the compiler can tell thata
andb
do not overlap. – Faucha
andb
can't overlap?" I want to achieve this via optimizations for this particular set ofv1
andv2
. I don't want to touch the definition offoo
, but I can change themain
code. Perhaps there is marker that mimics the explicit copy construction ofv1
andv2
or add an assumption for the compiler on the lines of "assume that v1.data() != v2.data()" or more specifically to tell the compiler that "v1.data()" is from a different allocation than "v2.data()". – Faucharr.RESTRICT_begin()
or bymake_restricted(arr.begin())
. Not sure how it will scale in general. – Fauch