Tricks to avoid pointer aliasing in generic code
Asked Answered
F

2

5

This is a follow up to this question, Restricted access for allocated arrays belonging to separate object, after discovering that returning by value across compilation units doesn't help to hint the compiler about disjoint memory allocations.

I have two functions make_v1 and make_v2 that return an array-like object, by value. (This could be std::vector for example, the important point is that the copy constructor clearly allocates memory).

#include<vector>

template<class It> void modify_v(It);

void dreaded_function();

template<class It1, class It2>
void foo(It1 a, It2 b) {
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function();  // this is never called if a and b do not overlap
}

int main() {
    std::vector<int> v1 = make_v1();
    std::vector<int> v2 = make_v2();

    foo(v1.begin(), v2.begin());
}

As you can see in the compiled code, there is no assumption made optimizing the call to dreaded_function. This is just an example of pointer aliasing, just to diagnose the situation. There are well known cases of pointer aliasing creating worst performance problems.

https://godbolt.org/z/345PKrsnx

What high-level hints I can give the compiler, in C++ (or perhaps using extensions) that the iterator ranges are at memory regions that cannot overlap?

If this were C, or if I were using pointers and C++ extensions I could use __restrict and a pointer interface, but I want something more high-level.

(Thanks @TedLygmo for pointing out that __restrict somehow works for std::vector iterators).

The only option that I found was to be very explicit about the copy:

    ...
    std::vector<int> v1 = static_cast<std::vector<int> const&>(make_v1());
    std::vector<int> v2 = static_cast<std::vector<int> const&>(make_v2());

This at least convinces clang that it can make the optimization. (and at the cost of an extra copy I am pretty sure.)

https://godbolt.org/z/nYfKeTKqj

But this is ugly and it is also making a copy, also if make_v becomes an included function later, the solution will have a cost. (Also it still doesn't convince GCC!)

Is this the only way I can well the compiler than v1 and v2 memory do not overlap? Is there a more streamlined solution? Should [[assume]] work for this eventually?

At this point, the best I am hoping is some keyword/extension that "simulates" that the result of make_v1/make_v2 is copied into v1/v2.

Fauch answered 21/10, 2024 at 18:23 Comment(15)
Perhaps make local pointers to the iterator contents, and make them __restrict?Dubbing
@HolyBlackCat, are you suggesting writing a non-generic foo like foo_restricted that takes restricted pointer and call it foo_restricted_contiguous(&*v1.begin(), &*v2.begin())?Fauch
Using __restrict__ actually works: godbolt.org/z/dr79r967z - Now I want restrict as a keyword in C++ too :-)Frag
@TedLyngmo, thanks. I could have sweared that last time I tried I got a compilation error saying that __restrict can only be applied to a pointer argument. Anyway I was hopping for something more generic than this, since now I have a version of template foo that might not work as expected for other arguments (I want to control de optimization from the user code, as when make_v are visible functions).Fauch
@Fauch Yes, using restrict doesn't feel right unless it's very clear that one isn't allowed to call it with overlapping data.Frag
C++ does not have a restrict equivalent. GCC and Clang have __restrict__, and Visual C++ has __declspec(restrict). Note that restrict does not act as an assurance or assertion that the pointers are non-aliasing, the onus is on the programmer to ensure the pointers are non-aliasing. If the pointers are aliasing, then undefined behavior — hijinks & hilarity ensues. C++23 [[assume(p != q)]] might allow for the same kind of optimization.Carbamate
@TedLyngmo, right, I want to control the semantics from the v1 and v2 objects. I think restrict is a very dangerous feature and a partial fix at best, I don't even understand how it works for iterators. For can I make it for work for v1 and v2 themselves?Fauch
@Eljay, yes, what I am saying is that compilers already have the technology to detect aliasing, semantically at least. But this only works when the body of make_v are visible.Fauch
@TedLyngmo, ah, you actually used pointer for restrict to work, the function stops being generic completely because it implicitly only works for pointers. godbolt.org/z/nh8vb11dTFauch
@Fauch Yeah, it's perhaps possible to use the called function as a proxy that calls foo_detail(&*a, &*b); where that function uses __restrict__.Frag
I don't understand how you intend to ensure that a and b can't overlap. You can't compare them unless they're iterators to the same collection or pointers to members of the same object.Lunch
@MarkRansom, Conceptual answer: because I know make_v1/make_v2 return by value. Practical answer: in the same way that when make_v1/make_v2 are visible to the compiler godbolt.org/z/8he5cc7Ed the compiler can tell that a and b do not overlap.Fauch
@MarkRansom, "how I indend to ensure that a and b can't overlap?" I want to achieve this via optimizations for this particular set of v1 and v2. I don't want to touch the definition of foo, but I can change the main code. Perhaps there is marker that mimics the explicit copy construction of v1 and v2 or add an assumption for the compiler on the lines of "assume that v1.data() != v2.data()" or more specifically to tell the compiler that "v1.data()" is from a different allocation than "v2.data()".Fauch
@TedLyngmo, I found a partial solution (below as an answer). Since I control the container (Multi-dimensional arrays) I can have a special member and iterator created with arr.RESTRICT_begin() or by make_restricted(arr.begin()). Not sure how it will scale in general.Fauch
Nice! I like it!Frag
F
3

I found a hacky way to do this. I realized that I can use the __restrict keyword in member (pointer) variables.

So, basically, if I "reimplement" vector iterator with a restrict pointer then I can get away with this, in principle.

In practice this only helps on GCC, and not in clang.

template<class Iterator>
struct restricted_vector_iterator {
    explicit restricted_vector_iterator(Iterator it) : p_{&*it} {}

    decltype(auto) operator*() const {return *p_;}

    decltype(auto) operator++(int) {++p_; return *this;}

 private:
    typename Iterator::pointer __restrict p_;
};


...
    foo(restricted_vector_iterator{v1.begin()}, restricted_vector_iterator{v2.begin()});

https://godbolt.org/z/oE1f1vnPr

This looks like casting the iterator to pointer but in generic code a restrict class could work differently for different iterator types. Also I am not modifying "library" code, only user code. (The user knows that the ranges do not overlap.)

Of course this is not ideal.

First, It doesn't scale well, because every critical iterator will have to be wrapped/reimplemented (__restrict cannot be applied to the iterator class, only to the contained pointer). This would be much simpler if __restrict worked recursively for the pointer member inside a class (specifically an iterator.)

Second, it sounds kind of dangerous that one can keep a restricted pointer alive in the wild, maybe I can make this wrapper iterator non-copyable or at least "hard to copy".

Third, it is not clear to me how pointer arithmetic will work (are pointers calculated from other restricted pointers, also restricted?, are restricted among them?)

Fourth, it is not portable, both because __restrict is an extension, and because at the end it only does the optimization on GCC it seems.

If you have any comments let me know.

Finally, if I had control over the container, I could have a different method like v1.RESTRICT_begin() for example.

Fauch answered 24/10, 2024 at 8:40 Comment(0)
C
1

The general way to convince the compiler of an arbitrary boolean predicate is to __builtin_assume it, something like this:

#include <version>
#if defined(__clang__)
 #define ASSUME(x) __builtin_assume(x)
#elif defined(_MSC_VER)
 #define ASSUME(x) __assume(x)
#elif __has_builtin(__builtin_unreachable)
 #define ASSUME(x) do { if (!(x)) __builtin_unreachable(); } while (0)
#elif defined(__cpp_lib_unreachable)
 #include <utility>
 #define ASSUME(x) do { if (!(x)) std::unreachable(); } while (0)
#endif

(GCC supports __builtin_unreachable but not __builtin_assume. MSVC supports it but only as __assume, not __builtin_assume.)

Unfortunately, no existing compiler seems to understand the following sequence (Godbolt):

int misunderstood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return *p;  // should return "1", surely
}

OTOH, both GCC 10+ and Clang 11+ understand this sequence just fine:

bool understood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return (p == q);  // returns "false"
}

so you'd think there's no actual reason they couldn't go on to understand that, if two pointers point different places, then modifying one place can't possibly modify the other. And vice versa, it's ironic that neither GCC nor Clang understand that two independently __restrict'ed pointers can't be equal. It's almost as if both compilers have an internal idea of "aliasing" which is somehow orthogonal to "equaling."

Clavicytherium answered 27/10, 2024 at 17:3 Comment(2)
Yeah, I have to learn exactly how different compilers interpret __restrict. It could be in a non-standard non-uniform way. Although in this particular case it should work, I think it makes sense that aliasing and inequality are different things. aliasing might only make sense in relation to the * operation. Also inequalities should persist after arithmetic, restrict p, restrict q means (loosely at least) that q != p, but it also means that q + n != p. At least is what in my mind, should make __restrict useful.Fauch
Right, the set of pairs (p,q) such that "p equals q" ought to be a strict subset of the set of pairs (p,q) such that "p aliases q." But the examples above indicate that GCC and Clang don't understand that: they both seem to think it's possible that "p doesn't alias q (enforced via __restrict) and yet p equals q anyway." I think that's a missed-optimization bug in both compilers.Clavicytherium

© 2022 - 2025 — McMap. All rights reserved.