Why does vectorization behave differently for almost the same code?
Asked Answered
N

1

13

Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that?

#include <vector>

typedef std::vector<double> Vec;

void update(Vec& a, const Vec& b, double gamma) {
    const size_t K = a.size();
    for (size_t i = 0; i < K; ++i) { // not vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

void update2(Vec& a, const Vec& b, double gamma) {
    for (size_t i = 0; i < a.size(); ++i) { // vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

void update3(Vec& a, size_t K, const Vec& b, double gamma) {
    for (size_t i = 0; i < K; ++i) { // vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

int main(int argc, const char* argv[]) {
    Vec a(argc), b;
    update(a, b, 0.5);
    update2(a, b, 0.5);
    update3(a, a.size(), b, 0.5);
    return 0;
}

Relevant messages from the compiler (VS2013):

1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(7) : info C5002: loop not vectorized due to reason '1200'
1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(13) : info C5001: loop vectorized
1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(19) : info C5001: loop vectorized

From comment by @tony

Reason 1200: "Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences." source

Nikaniki answered 8/5, 2015 at 21:21 Comment(9)
Try a different compiler? Others (gcc and clang) do vectorize all 3 functions.Isiah
What is "reason 1200" documented as?Psychosomatic
Reason 1200: "Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences." sourceRoanna
I'm surprised that any compiler is able to vectorize any of those at all. The compiler needs to prove that either a and b do not alias or that &a[0] < &b[0].Stouthearted
@Stouthearted Or much more easily, the compiler can generate both the sequential and vectorized versions, and have a runtime test checking if a and b overlap to dispatch to either version.Isiah
@Mysticial, from sum-of-overlapping-arrays-auto-vectorization-and-restrict GCC builds a version with and without overalap (which it can vectorize) and then when calling the function it checks for overlap first. Using restrict does not build the version with overlap and removes the check.Ethology
All three loops seems sun-optimal to me because compiler assumes a and b aliases. For best performance, use restricted pointers to access data elements instead.Holbrook
@Holbrook c++ does not have restrictStrasser
@Strasser but MSVC have restricted keyword, although ideally a C99 implementation and a C++ wrapper is the best.Holbrook
S
2

I guess it's some deeply internal compiler implementation issue, like at what stage did auto-vectorizer "kick in" and what's the state of the internal representation of the code at that time. When I tried on MSVC2017, it worked more in line with what one would expect. It auto-vectorized update() and update3(), but not update2(), with the reason given 501 for line 14, which is documented as:

Induction variable is not local; or upper bound is not loop-invariant.

Surf answered 2/3, 2018 at 16:15 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.