Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.
https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program
, then in another terminal run gdb ./your-program
and use target remote :portnumber
to connect to the SDE process so you can set breakpoints and single-step.
You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.
QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.
You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.
Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)