How do I enable the SSE4.2 instruction set in Visual C++?
Asked Answered
G

4

5

I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images.

In the paper about the BRIEF-descriptor is written that it is possible to speed up things:

"The BRIEF descriptor uses hamming distance, which can be done extremely fast on modern CPUs that often provide a specific instruction to perform a XOR or bit count operation, as is the case in the latest SSE instruction set."

With SSE4.2 enabled it should be speeded up. My questions is simply how I do this in Visual C++?

An alternative way could be to choose another compiler supporting SSE4. For instance Intel's ICC. Is this really necessary?

Giuditta answered 17/4, 2012 at 9:5 Comment(1)
Do you the difference between Visual Studio (IDE) and Visual C++ (Programming Language)? :)Coir
R
6

Unfortunately, it doesn't work like that.

The C/C++ compiler may be told to use a specific instruction set in project-> C/C++ -> Code generation->Enable enhanced instruction set. But it does almost nothing, and in your case, absolutely nothing. That's because some CPU instructions cannot be easily accessed from C statements. Some compilers (like Intel's) are better at this than others, but for what you want to achieve, no compiler is smart enough.

What you have to do is to find the specific algorithm, learn the SSE instructions and rewrite the algorithm with those instructions manually. You can write in pure assembly, or use intrinsic functions, which can be called from C/C++, and will issue SSE instructions when compiled.

Reciprocation answered 17/4, 2012 at 11:7 Comment(1)
A good C++ library like libstdc++ (typical for GNU/Linux) or the newer libc++ will use popcnt to implement std::bitset<64>::count() when it's guaranteed-available at compile time. But MSVC is designed around a model of one binary that does runtime dispatching, and its library may not have optimizations like that even for -arch:AVX (which implies SSE4.2 and thus popcnt) so in practice this answer is probably right for MSVC.Guideboard
D
3

The MSVC compiler has an /arch option for specifying the minimum architecture you want your program to target. Setting it like /arch:SSE2 will tell the compiler to assume that the CPU supports the SSE2 instructions, and it will automatically use them whenever the optimizer determines it's appropriate.

However, MSVC has no /arch:SSE4 or /arch:SSE42 option. A peek into the standard library implementation suggests that /arch:AVX or /arch:AVX2 also implies SSE4.2. For example, the MSVC implementation of the C++20 library function std::popcount will do a runtime check of the processor to see if it can use the SSE4.2 popcnt instruction. But if you target AVX, it skips the runtime check and just assumes the processor supports it.

I think gcc and clang do have specific options for enabling SSE4 and SSE4.2. Update: Peter Cordes confirms in the comments: "To enable popcnt specifically, -mpopcnt, or for SSE4.2 -msse4.2 which implies popcnt."

You can also use intrinsic functions for built-in instructions if you don't want to rely on the optimizer and the library implementation to find the optimal instructions.

Deca answered 14/2, 2021 at 20:5 Comment(3)
For GCC/clang: -march=nehalem or later, or -march=znver1 (Zen) will enable instruction sets those CPUs support (and tune for the one you specify), which does include popcnt. Or -march=native. To enable popcnt specifically, -mpopcnt, or for SSE4.2 -msse4.2 which implies popcnt.Guideboard
Glad to hear MSVC has some CPU-feature support for popcnt in std::popcount. Does it also expose that via std::bitset<64>::count(), which is the standard way to get at a hopefully-platform-optimized popcount?Guideboard
@PeterCordes: I haven't looked at std::bitset specifically, but I'd be surprised if it wasn't taking advantage of popcnt. After peeking at the std::popcount implementation, I confirmed the generated code used popcnt (and that /arch:AVX eliminates the run-time check and fallback) using Godbolt Compiler Explorer. It should be simple to do the same for std::bitset.Deca
J
1

You can pass /arch: options in undocumented way as /d2... options. Like /d2archAVX.

/d2archSSE42 is accepted this way. It is the only possible option not available via the documented /arch:

Jubbulpore answered 14/2 at 19:47 Comment(0)
T
1

It seems like Visual Studio 17.11.5 (and toolset 14.41) added /arch:SSE4.2.

Not very many SSE 4.2 instructions are used, on par with the old undocumented /d2archSSE42, but it's also getting documented: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

Tulle answered 3/11 at 4:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.