Do compilers usually emit vector (SIMD) instructions when not explicitly told to do so?
Asked Answered
N

1

7

C++17 adds extensions for parallelism to the standard library (e.g. std::sort(std::execution::par_unseq, arr, arr + 1000), which will allow the sort to be done with multiple threads and with vector instructions).

I noticed that Microsoft's experimental implementation mentions that the VC++ compiler lacks support to do vectorization over here, which surprises me - I thought that modern C++ compilers are able to reason about the vectorizability of loops, but apparently the VC++ compiler/optimizer is unable to generate SIMD code even if explicitly told to do so. The seeming lack of automatic vectorization support contradicts the answers for this 2011 question on Quora, which suggests that compilers will do vectorization where possible.

Maybe, compilers will only vectorize very obvious cases such as a std::array<int, 4>, and no more than that, thus C++17's explicit parallelization would be useful.

Hence my question: Do current compilers automatically vectorize my code when not explicitly told to do so? (To make this question more concrete, let's narrow this down to Intel x86 CPUs with SIMD support, and the latest versions of GCC, Clang, MSVC, and ICC.)

As an extension: Do compilers for other languages do better automatic vectorization (maybe due to language design) (so that the C++ standards committee decides it necessary for explicit (C++17-style) vectorization)?

Nievelt answered 3/6, 2017 at 4:2 Comment(15)
As usual, depends on the compiler, meaning that the only possible answer right now is "maybe". Please ask for a specific compiler. Also, the standard will never specify any vectorization or optimization at all - C++ runs on many different architectures, some of which don't support vector instructions or certain operations. Everything optimization related is implementation-defined.Kally
@Kally I've amended the question to reflect that I'm focusing on four modern C++ compilers. I know the standard will never specify any vectorization at all; I thought it was implied by my question that I'm only concerned with hardware that has the necessary SIMD vector instructions. Nevertheless, have stated that explicitly now.Nievelt
Nothing in your link says anything about VC++ being unable to emit SIMD instructions. Also, "Last edited Apr 17, 2014 at 12:17 AM"Figurine
why not? msdn.microsoft.com/en-us/library/hh872235.aspx gcc.gnu.org/projects/tree-ssa/vectorization.html llvm.org/docs/Vectorizers.html software.intel.com/en-us/compiler_15.0_vec_c en.wikipedia.org/wiki/Automatic_vectorizationWayfarer
All that page says is that this specific experimental implementation of an experimental library does nothing special when told it can vectorize execution. It says nothing about whether the compiler will vectorize it anyway, or whether the compiler could autovectorize other code.Figurine
You know, I never really used VC since I a) realized some bottleneck is 300% faster in GCC without changing anything, b) got fed up with the many incompatibilities to the standard. ... Don't be surprised if VC isn't the fastest one and/or lacks features. It's normal. (And don't get me started on the std lib. Mutex :o)Fray
@Fray And when did you last use VC? They're in the process of completely replacing their entire frontend from what I understand.Kally
@Kally I'm clearly not talking about the UI.Fray
@Fray I wasn't talking about the GUI either. I meant the compiler frontend. I thought it would've been clear enough from the context.Kally
@Kally While I knew that the standard compliance is slowly improving; the plan of a major redesign of the frontend is indeed news to me; thanks (just reading an article from 2015).... (However, the performance and stdlib problems are still there. Again, VS users, don't be too surprised)Fray
"whenever there is an opportunity" is an extreme overstatement turning the compiler into some kind of omnipotent god - the reality if of course "whenever the compiler is capable of detecting and using the opportunity". There are many vectorisible patterns that compilers don't implement.Helton
@Figurine Saying that VC++ is unable to generate SIMD code when explicitly told to do so was an overstatement. I took "We are currently working on the compiler support to properly implement the std::vec policy" to mean that the compiler (as opposed to the library) lacks proper support for vectorization in general. Then, the usefulness of C++17's explicit parallelization would be due to the compiler being unable to recognize certain vectorizible patterns, or due to external information not available to the compiler.Nievelt
VC++ can generate vectorized instructions. Those library functions won't.Lipinski
MSVC generates vector instructions, of course, but it does a very poor job of auto-vectorizing. Other compilers (GCC, Clang, and ICC) do a better job, but they're still not perfect.Wernsman
Godbolt supports Clang, GCC, ICC, and MSVC. Look at the assembly to check vectoriziation. All these compilers can vectorize depending on the code godbolt.org/g/DB4rYOSondra
H
11

The best compiler for automatically spotting SIMD style vectorisation (when told it can generate opcodes for the appropriate instruction sets of course) is the Intel compiler in my experience (which can generate code to do dynamic dispatch depending on the actual CPU if required), closely followed by GCC and Clang, and MSVC last (of your four).

This is perhaps unsurprising I realise - Intel do have a vested interest in helping developers exploit the latest features they've been adding to their offerings.

I'm working quite closely with Intel and while they are keen to demonstrate how their compiler can spot auto-vectorisation, they also very rightly point out using their compiler also allows you to use pragma simd constructs to further show the compiler assumptions that can or can't be made (that are unclear from a purely syntactic level), and hence allow the compiler to further vectorise the code without resorting to intrinsics.

This, I think, points at the issue with hoping that the compiler (for C++ or another language) will do all the vectorisation work... if you have simple vector processing loops (eg multiply all the elements in a vector by a scalar) then yes, you could expect that 3 of the 4 compilers would spot that.

But for more complicated code, the vectorisation gains that can be had come not from simple loop unwinding and combining iterations, but from actually using a different or tweaked algorithm, and that's going to hard if not impossible for a compiler to do completely alone. Whereas if you understand how vectorisation might be applied to an algorithm, and you can structure your code to allow the compiler to see the opportunities do so, perhaps with pragma simd constructs or OpenMP, then you may get the results you want.

Vectorisation comes when the code has a certain mechanical sympathy for the underlying CPU and memory bus - if you have that then I think the Intel compiler will be your best bet. Without it, changing compilers may make little difference.

Can I recommend Matt Godbolt's Compiler Explorer as a way to actually test this - put your c++ code in there and look at what different compilers actually generate? Very handy... it doesn't include older version of MSVC (I think it currently supports VC++ 2017 and later versions) but will show you what different versions of ICC, GCC, Clang and others can do with code...

Handspring answered 4/6, 2017 at 10:10 Comment(4)
Godbolt's Compiler Explorer does include MSVC. It didn't from the beginning, but it does now, and has for several months.Wernsman
Hi Cody.. goodness me, yes, "Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25017" is available as "x86 CL 19 2017 RTW" (I was expecting something more like "MSVC" as a label), thanks for the heads-up.Handspring
Thanks for pointing out Godbolt's compiler explorer, I'll be sure to check it out!Nievelt
@Handspring could you edit your answer please to reflect the level of MSVC support in MSVC?Singhalese

© 2022 - 2025 — McMap. All rights reserved.