vc++ no longer vectorize simple for loops with range-based syntax
Asked Answered
J

1

11

Before replacing a lot of my "old" for loops with range based for loops, I ran some test with visual studio 2013:

std::vector<int> numbers;

for (int i = 0; i < 50; ++i) numbers.push_back(i);

int sum = 0;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) sum += *number;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//vectorization
for (auto __begin = numbers.begin(),
    __end = numbers.end();
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//no vectorization :(
for (auto number : numbers) sum += number;

//no vectorization :(
for (auto& number : numbers) sum += number;

//no vectorization :(
for (const auto& number : numbers) sum += number;

//no vectorization :(
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);

looking at the disassembly, standard for loops were all vectorized:

00BFE9B0  vpaddd      xmm1,xmm1,xmmword ptr [eax]  
00BFE9B4  add         ecx,4  
00BFE9B7  add         eax,10h  
00BFE9BA  cmp         ecx,edx  
00BFE9BC  jne         main+140h (0BFE9B0h)  

but range based for loops were not :

00BFEAC6  add         esi,dword ptr [eax]  
00BFEAC8  lea         eax,[eax+4]  
00BFEACB  inc         ecx  
00BFEACC  cmp         ecx,edi  
00BFEACE  jne         main+256h (0BFEAC6h)  

Is there any reason why the compiler couldn't vectorize these loops ?

I really would like to use the new syntax, but loosing vectorization is too bad.

I just saw this question, so I tried the /Qvec-report:2 flag, giving another reason:

loop not vectorized due to reason '1200'

that is:

Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.

Is this the same bug ? (I also tried with the last vc++ compiler "Nov 2013 CTP")

Should I report it on MS connect too ?

edit

Du to comments, I did the same test with a raw int array instead of a vector, so no iterator class is involved, just raw pointers.

Now all loops are vectorized except the two "simulated range-based" loops.

Compiler says this is due to reason '501':

Induction variable is not local; or upper bound is not loop-invariant.

I don't get what's going on...

const size_t size = 50;
int numbers[size];

for (size_t i = 0; i < size; ++i) numbers[i] = i;

int sum = 0;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) sum += *number;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//vectorization ?!
for (auto number : numbers) sum += number;

//vectorization ?!
for (auto& number : numbers) sum += number;

//vectorization ?!
for (const auto& number : numbers) sum += number;

//vectorization ?!
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);
Joust answered 15/11, 2014 at 1:13 Comment(10)
It seems the compiler can't look through the iterator type. Try using your range-based for emulation using &v[0] and &v[0] + v.size() to confirm this suspicion.Ghirlandaio
@DietmarKühl If I have understood correctly, I tried : for (auto __begin = &numbers[0], __end = &numbers[0] + numbers.size(); __begin != __end; ++__begin) { auto && ref = *__begin; sum += ref; } But this also vectorize.Joust
If the version using pointers vectorizes the loop, clearly the iterator wrapping the pointer upsets the compiler: the type returned from std::vector<T>::begin() doesn't have to be T* (or T const*). It seems the compiler can't detect that this iterator is nothing more than a thin wrapper over a pointer.Ghirlandaio
@DietmarKühl I did the same with raw pointers and array, please see the edit.Joust
If it isn't the iterator vs. pointer upsetting the compiler, it surely is something else. Possibly the compiler doesn't like the use of end instead of __begin != __begin + size. I don't have MSVC++ to check for myself...Ghirlandaio
I wonder how GCC and ICC behave. I don't have them in front of me to try it.Garlic
Complete list of compiler flags used? Can you double check that all of your "vectorization" and "NO vectorization" is correct? Your comment seems to disagree with some of the comments in the above code at first glance. Note that you did not bind the range_expression to an rvalue reference in your "definition of range for" samples. Your emulation is also wrong in a few ways: your begin-expression and end-expression do not line up perfectly with what range-for is supposed to do in a few cases.Articulation
@Yakk My first comment was a test with std::vector<int> using &numbers[0] instead of numbers.begin(), whereas the edit uses raw pointers and array. About "ranged for definition", i don't know what to do with "auto && __range = range_expression" cause "__range" is not used (cf link). Complete list of flags: /GS- /GL /analyze- /W3 /Zc:wchar_t /Zi /Gm- /Ox /Ob2 /Fd"Release\vc120.pdb" /fp:precise /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_LIB" /D "_MBCS" /fp:except- /errorReport:prompt /WX- /Zc:forScope /arch:AVX /Gd /Oy- /Oi /MD /Fa"Release\" /nologo /Fo"Release\" /Ot /Fp"Release\test.pch".Joust
@realprog the begin expression and end expression uses the range expression. There are 3 possible begin/end expressions mandated involving members, std::begin, etcArticulation
So, here comes the advice: never try to reason about the implementation (temper) of a compiler, especially with respect to optimization :)Wizen
R
1

My guess could be that the range-based for loops do not offhand know that the object is a vector or an array or a linked list therefore the complier does not know beforehand vectorizes the loop. Range-based for loops are the equivalent of foreach loop in other languages. There might be a way to hint the complier to hint beforehand vectorizes the loop using a macro or a pragma or a complier setting. To check the please try using the code in other compliers and see what you get I would not be surprised if you get non-vectorized assembly code with the other compliers.

Rothenberg answered 21/1, 2015 at 23:1 Comment(1)
have you tested the code on other compliers and check the results?Rothenberg

© 2022 - 2024 — McMap. All rights reserved.