Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)
Asked Answered
V

1

13

So far I have managed to find out that:

  • SSE and SSE2 are mandatory for Windows 8 and later (and of course for any 64-bit OS)
  • AVX is only supported by Windows 7 SP1 or later

Are there any caveats regarding using SSE3, SSSE3, SSE4.1, SSE 4.2, AVX2 and AVX-512 on Windows?

Some clarification: I need this to determine what OSs will my program run on if I use instructions from one of the SSE/AVX sets.

Viscometer answered 3/12, 2015 at 14:52 Comment(2)
I don't think it's a SU question, I doubt coding in assembly is something even the most super users do. I'll reword the question to make it more clear that I am trying to utilize the opcodes in my programs.Viscometer
I don't see how this question is just "about general computing software". The sole fact that it contains "Windows" does not mean it is off-topic here. This is a question about platforms from perspective of programmers, and it is clearly important to those who code in SSE/AVX/etc.Cranky
P
17

Extensions that introduce new architectural state require special OS support, because the OS has to save/restore restore more data on context switches. So from the OSes perspective, there's nothing extra it needs to do to let user-space code run SSSE3 instructions, if the OS supports SSE.

SSE, AVX, and AVX512 are the extensions that introduced new architectural state.

  • SSE introduced the xmm regs (and MXCSR for rounding modes and FP exception state)
  • AVX introduced ymm (the lower half of which are the old xmm regs).
  • AVX512 introduced zmm (extending the x/ymm regs), and also doubled the number of vector regs in 64bit mode: zmm0-zmm31. x/y/zmm16..31 are only accessible with AVX-512 encodings of vector instructions (EVEX prefix), and thus interestingly can be used without requiring vzeroupper, and aren't affected by it.
    k0..k7 64-bit mask registers (or 16-bit without AVX-512BW in Xeon Phi) are also new in AVX-512.

You check for CPU support for SSE or AVX the usual way, with the CPUID instruction.

To prevent silent data corruption when using a new extension on a multi-tasking OS that doesn't save/restore the new architectural state on context switches, SSE instructions fault as illegal instructions if the OS hasn't set an OS-support bit in a control register. So vector extensions "don't work" on OSes that don't know about saving/restoring the necessary state for that extension.


For SSE, there may not be any clean OS-independent way to detect that the OS has promised to save/restore SSE state on context switches by setting the CR4.OSFXSR, CR4.OSXMMEXCPT etc. bits, because even reading a control register is privileged, and there's no CPUID bit that reflects the setting. SSE support is so widespread that you'd have to be using a really ancient version (or homebrew) OS for this to be a problem.


For AVX, we don't need OS support to detect that AVX is usable (supported by hardware and enabled by the OS): User-space can run xgetbv and check the enabled-feature flags to see if the OS has enabled AVX instructions to run without faulting.

From Intel's intro to AVX:

  • Verify that the operating system supports XGETBV using CPUID.1:ECX.OSXSAVE bit 27 = 1.
  • At the same time, verify that CPUID.1:ECX bit 28=1 (Intel AVX supported) and/or bit 25=1 (AES supported) ... (and other bits for FMA, AES, and PCLMULQDQ)
  • Issue XGETBV, and verify that the feature-enabled mask at bits 1 and 2 are 11b (XMM state and YMM state enabled by the operating system).

It may be easier to call an OS-provided function to detect OS support, instead of using inline asm or a feature-detect library to do all this. For example, Win7SP1 introduced GetEnabledXStateFeatures along with support for AVX CPUs. (It's unlikely or maybe impossible to find Win7SP1 running on a CPU without SSE, so for SSE you can just check CPUID and OS version.)

This is also understood to be a promise that the OS's context switches will correctly save/restore the full state, although of course a buggy, malicious, or esoteric OS (perhaps cooperative multi-tasking?) could be different. For mainstream OSes including Windows, it does mean YMM registers will keep their values just like you'd expect.


The same is true for AVX512: you can check the CPUID feature bit for the instruction set, and check that the OS has promised to manage the new architectural state on context switches by enabling the right bits in with XSETBV. (So you should check with XGETBV). Check for XGETBV result AND 0xE6 equals to 0xE6.

Prone answered 3/12, 2015 at 16:37 Comment(18)
So this means I cannot use AVX512 extensions in my programs running under OSes released before 2013, since that's when it was announced. Do you know which Windows versions will receive updates that make them handle zmm registers correctly?Viscometer
Actually, you should use IsProcessorFeaturePresent and GetEnabledXStateFeatures because they tell you not only whether the CPU feature exists, but also whether the OS supports it. It would be bad to detect (say) AVX support in the CPU, and then use AVX instructions, only to find that your AVX state gets corrupted at every context switch because the OS doesn't have AVX context switching support.Millesimal
@Alexey: It takes time for a tested-well-enough-to-ship implementation to be ready, after Intel announces things. AVX512 still isn't present in any normal CPU that's for sale, only the Knight's Landing many-core stuff. Anyway, I have no idea what Windows' AVX512 support will be like. It's quite easy to add OS support for saving/restoring the extra state. Just set a couple more feature bits for XSAVE/XRSTOR, and the CPU will save the extra state.Prone
@RaymondChen It's not as bad you as you think actually. The processor enforces it. It will issue an illegal instruction exception if the program tries to execute an AVX instruction when the XSAVE bit is not set. The XSAVE bit is unset by default for backwards compatibility. Once the OS sets it, it has "accepted the contract" that it will preserve ymm state across context switches. Then the processor will let you execute AVX instructions.Buzzell
@Mysticial: thanks, I'd been meaning to come back and correct this answer after learning that Intel did make an OS-independent way to detect AVX OS support.Prone
To be fair, it would've been insane to leave such a loophole that would cause data-corruption. It's very easy to do accidentally and even VS2013 messed this up by having their math.h library try to use FMA instructions without checking XSAVE. My primary test box (a Haswell) has a bunch of OS's installed - one of which is Vista (no AVX support). I use it to test these sorts of things (among others).Buzzell
@AlexGuteniev: Thanks for the edit. I guess Win7SP1 (which introduced GetEnabledXStateFeatures) won't run on ancient CPUs without SSE1? If it could, you could just use that function if available instead of also checking CPUID. Otherwise, yes for SSE check CPUID and then GetEnabledXStateFeatures even being available would imply that the OS supports it, so there'd be no need to call it.Prone
I don't think there's ever a point to call GetEnabledXStateFeatures to check for SSE support by OS. Windows and later 2000 supported SSE (maybe even starting in Windows 98). Targeting older than WinXP systems is hard, you'll need to obtain old toolsets (say, Visual Studio 2005 was last to support Win98), and older offline documentation (as the information about old systems is not kept online). So basically if your app can run, SSE is supported by OS. And if GetEnabledXStateFeatures is available, then AVX is supported by OS, not just SSEJoy
@AlexGuteniev: OS support for AVX doesn't help if your CPU doesn't have AVX or SSE. If you can use GetEnabledXStateFeatures, you don't also need to run CPUID and decode it as well, right? Instead use only it to check for CPU & OS support for whatever SIMD feature level. The question then becomes whether Win7SP1 can run on a CPU without SSE1, like Pentium II or a 32-bit VM that doesn't pass or emulate through SSE. I wouldn't be surprised if the answer is "no" even for a 32-bit kernel, but it's not obvious to me.Prone
@AlexGuteniev: I googled and ghacks.net/2018/06/21/… says Win7 update support was dropped for CPUs without SSE2, but IDK if that was before or after SP1. So the presence of GetEnabledXStateFeatures may imply SSE2 or not. But you need to call it (or check CPUID yourself) to see if you can use anything higher.Prone
If you can use GetEnabledXStateFeatures, you don't also need to run CPUID and decode it as well, right? -- Not sure. It is the replacement for XGETBV query, not a full query.Joy
The function to replace the full feature query is IsProcessorFeaturePresent. It is present from old OS (was there at least since WinNT4/Win98). But it may not support detection of some features in older OS; for example in Windows 2000 it does not support detection of SSE2, whereas SSE2 if present in CPU, can be used.Joy
Although I was told to use this function instead of __cpuid to work around __cpuid misinformation github commentJoy
@AlexGuteniev: Oh good point about feature levels that didn't introduce new architectural state. An OS function wouldn't report SSE2 support if the OS only knows about SSE1, but that's sufficient to context-switch the XMM registers, and set the bits that control faulting of all SSE instructions. (And I was just editing my previous comment after noticing the article I found was that win7 update support was dropped for non-SSE2 CPUs, not that initial Win7 dropped support vs. earlier Windows versions.)Prone
@AlexGuteniev: That github comment is odd; I'd hardly call it "buggy" for the BIOS to not disable CPUID feature bits based on figuring out what OS was going to boot! How would the BIOS even know in the general case of a multi-boot system with GRUB, and why would it even make sense for CPUID not to reflect the ability to execute cmpxchg16b whether or not the OS uses it? The code suggestion is correct, but the explanation for why you need to ask the OS seems totally wild. (Also, lock-free user-space code can interact with other user-space threads without it mattering what the OS does.)Prone
(Asking whether any pre-compiled libraries that might interact with this object will use CX16 is relevant, though, if you want atomic_ref<128bit> to be compatible with atomic<128bit> or some interlocked functions I guess. Anyway, off topic here, definitely still crazy to think the BIOS should disable CPUID feature bits based on knowing something about what OS will boot. Not sure if such disabling is even possible, other than by hypervisors when CPUID causes a vmexit.)Prone
AVX/AVX2 flags of IsProcessorFeaturePresent are undocumented, so not sure what OS they need. It makes __cpuid+_xgetvb more appealing solution. (Also don't want to add an API call to othewise potentially portable parts of code)Joy
@AlexGuteniev: Yeah, agreed that calling OS functions only makes sense to query for support for context-switching new architectural state, not later feature levels like SSE4 or AVX2 that the OS might not know about. And then probably only for SSE1, since the portable way to check via xgetbv is no harder and is fully portable, and you need CPUID anyway if you want something other than just SSE1 or AVX1. Or AVX-512F I guess. (BV not VB; I'm guessing the mnemonic stands for "bit vector" or something. But I catch myself typoing it often.)Prone

© 2022 - 2024 — McMap. All rights reserved.