x86 microarchitecture/SIMD market share
Asked Answered
C

2

5

Where can I find data about "market share" of x86 microarchitectures? What percentage of users of x86-family CPUs have a CPU that supports SSE4.2, AVX, AVX2, etc.?

I'm distributing precompiled binaries for my program, and I would like to know what is the best optimization target, and which SIMD extensions can be reasonably used without runtime checks.

I can find overall Intel vs AMD market share data, but not a breakdown of generations of Intel's and AMD's CPUs. Ideally I'd like breakdown also per OS and per country, but even general global stats for microarchitectures would be better than nothing.

Cyndie answered 28/10, 2018 at 22:14 Comment(5)
Have you considered shipping multiple binaries/DLLs/SOs and figuring out the proper one during the installation? Data like this might not be very easy to find.Radioman
@DanielKamilKozar I have considered this (as well as multiversioned functions), but I'm hoping to eliminate that kind of complexity.Cyndie
Anything newer than SSE2 (baseline for x86-64) without runtime checks is risky if there's no fallback or install-time detection. AVX and BMI1/2 are very far from being baseline, because Intel is still selling Celeron/Pentium chips with VEX prefix decoding disabled (presumably to make use of silicon with defects in 256-bit execution units), but SSE4.2 is getting closer and SSSE3 is a possibility. See Most recent processor without support of SSSE3 instructions?, and Mac OSX minumum support sse version.Horney
Do all 64 bit intel architectures support SSSE3/SSE4.1/SSE4.2 instructions? has a link to the Valve Hardware Survey for Steam clients (currently showing SSE3 as ~100% installed base, but SSSE3 only at 97%), so if you're shipping a PC game that should correlate pretty well with your target audience. For server stuff, you might easily be able to set an SSE4.2 minimum.Horney
@PeterCordes That's great info. Please post it as an answer!Cyndie
H
9

Anything newer than SSE2 (baseline for x86-64) without runtime checks is risky if there's no fallback or install-time detection.

AVX and BMI1/2 are sadly very far from being baseline, because Intel is still selling Celeron/Pentium chips with VEX prefix decoding disabled (presumably to make use of silicon with defects in 256-bit execution units), but SSE4.2 is getting closer, and SSSE3 is a possibility. See Most recent processor without support of SSSE3 instructions?, and Mac OSX minumum support sse version

Do all 64 bit intel architectures support SSSE3/SSE4.1/SSE4.2 instructions? has a link to the Valve Hardware Survey for Steam clients (currently showing SSE3 as ~100% installed base, but SSSE3 only at 97%), so if you're shipping a PC game that should correlate pretty well with your target audience. The breakdowns are a bit weird, though, for some entries. Like fcmov (x87 branchless conditional-move) is reported as having done down to 97.5%, but every P6-compatible CPU has it. You won't find a CPU with SSE2 but without FCMOV. Perhaps newer versions of Steam aren't testing for it. And perhaps older versions of Steam aren't testing for CMPXCHG16B? So take them with a grain of salt, but they're probably fairly sensible for SSE2/3/SSSE3/SSE4.x, and AVX.

For server stuff, you might easily be able to set an SSE4.2 minimum. Atom/Silvermont support it, and so do AMD's and VIA's low-power architectures, so energy-efficient servers can run it. Ancient mainstream CPUs don't tend to get much use for servers outside of personal home-server use, because they're often slower than a cheaper modern machine that runs cooler.

(Silvermont isn't likely to support AVX soon, even less AVX2 or FMA.)


You don't have to limit yourself to a single binary. You could even let people pick when they download, or your installer could select at install time.

Or you could have a run-time wrapper that picks an executable and dynamic libraries, so you effectively get runtime dispatching while still being able to compile with gcc -O3 -march=haswell or whatever to let the compiler use new instruction sets all over the place (beneficial especially for BMI1/BMI2 for efficient single-uop variable-count shifts).

Another option is dynamic linker tricks, either on a whole-library basis or on a per-function basis like glibc uses to resolve memcpy to __memset_avx2_unaligned_erms. perf report shows this function "__memset_avx2_unaligned_erms" has overhead. does this mean memory is unaligned?

All of these (except the per-function dynamic linker tricks) are easier than making your code aware of instruction-set extensions at runtime, and have zero performance overhead. (Unless you put stuff in a dynamic library when you wouldn't have otherwise, so it can't inline.)

Horney answered 28/10, 2018 at 23:17 Comment(3)
Very nice answer! Aren't Atom/Silvermont CPUs used in Routers/NAS/Microserver rather than full fledged servers? It's blurry line that classifies devices though. It's fascinating how Intel can fit a boatload of cores almost everywhere today.Dustydusza
@MargaretBloom: yes, Silvermont does get used in NAS / microserver type things. I don't know if there are high-density servers that use it that people would actually put in data centers to serve up a bunch of hard drives. If you google "avoton server", there are plenty of hits, though. e.g. logicsupply.com/ml600g-10 is a fanless microserver like you were talking about. Supermicro does mention a 1U chassis for their mini-ITX Avoton: supermicro.com/products/motherboard/atom/x10/a1sai-2750f.cfmHorney
@MargaretBloom: Oh right, current generation Atom server isn't Avoton anymore. supermicro.com/products/motherboard/Atom lists a bunch of newer boards, including some with a Flex ATX form factor and a 16-core C3958 CPU (TDP = 31W) and up to 256GB registered-ECC DIMM, 4x 10GBase-T, and 2x SATA3. Or a Mini-ITX board with 12x SATA3 + 2x 10Gb ethernet. Those are full-fledged servers, with M.2 and PCIe slots.Horney
N
1

The simple way to solve this problem (speaking as an ex-games programmer), is to simply compile binaries for each CPU level you wish to support (e.g. SSE2, SSE4, AVX2). The 'executable' for the game is simply a cpuid check, which then runs the correct exe depending on which CPU is detected.

Nierman answered 30/10, 2018 at 3:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.