How to test AVX-512 instructions w/o supported hardware? [closed]
Asked Answered
S

2

6

I'm trying to learn x86-64's new AVX-512 instructions, but neither of my computers have support for them. I tried using various disassemblers (from Visual Studio to online ones: 1, 2) to see the instructions for specific opcode encodings, but I'm getting somewhat conflicting results. Plus, it would've been nice to run some instructions and see their actual output.

So I'm wondering if there is an online service that allows to compile small (x86-64) assembly code and run it, or step through it, on a specific processor? (Say, Intel's Sandy Bridge, Cannon Lake, etc.)

Sastruga answered 12/8, 2018 at 1:59 Comment(0)
I
12

Use Intel® Software Development Emulator, aka SDE to run an executable on an emulated CPU that supports future instruction-sets. It's freeware (not open source, but a free download), and is available for Linux, Windows, and I think also OS X.

https://software.intel.com/en-us/articles/debugging-applications-with-intel-sde has step-by-step instructions for how to debug with it on Windows or Linux: SDE can work as a GDB remote, so you can run sde -debug -- ./your-program, then in another terminal run gdb ./your-program and use target remote :portnumber to connect to the SDE process so you can set breakpoints and single-step.


You might be able to do the same thing with QEMU, if they've added support for emulating AVX512. QEMU can also act as a GDB remote.

QEMU definitely has configurable instruction-set stuff, e.g. you could tell it to emulate an x86 with AVX but not AVX2 (like Sandybridge.) SDM can probably do the same thing.

You could even tell it to emulate something you won't find on real hardware, like AVX2 but not BMI1/2, if you want to verify that your CPUID checks don't assume anything implies anything else that isn't guaranteed.


Remember that these are both essentially useless for performance testing, only for correctness of your vectorization. IACA could be useful to get an idea of performance on SKX, but it's far from perfect and doesn't model memory bottlenecks at all. (Only the actual pipeline in some level of detail.)

Infiltration answered 12/8, 2018 at 2:41 Comment(9)
Yeah, I thought about an emulator too. I may try it. Although it's quite limiting. Stepping through code with a debugger would be my optimal solution. As for other online disassemblers, as my experience shows, most run on processors that don't support AVX512. I need to see if Amazon or Microsoft's Azure has a plan that supports low cost CPU rental. (like Hans Musgrave suggested.)Sastruga
@MikeF: My answer shows how you can single-step through the emulated code with a debugger. (Or at least links to an Intel article about how to do that on Windows. I only quoted the Linux part, because it's a couple simple commands.)Infiltration
@MikeF: If you literally just want a disassembler, use objdump -drwC -Mintel or Agner Fog's objconv to convert machine code into asm text. Your CPU doesn't have to support AVX512 for a disassembler to work, no emulation or anything needed. Or if you're compiling C or C++, use godbolt.org to get asm output from the compiler directly, without creating an executable and then disassembling it. e.g. godbolt.org/g/YsVuAX has some example functions with compiler output from gcc, clang, and MSVC.Infiltration
Thanks, Peter. And no, I don't need just a disassembler. (I can get them from many sources.) What I wanted is to test run those AVX512 instructions on the actual hardware. I'm currently trying to install a Windows 10 VM in a 30-day free trial Azure account. If that doesn't have a CPU that supports AVX-512, I'll look more closely into your suggested emulator. I appreciate all your suggestions though!Sastruga
@MikeF: Are you doing that for performance testing? Your question doesn't say that, so a free emulator you can run on your desktop to single-step AVX512 code seems a lot better to me.Infiltration
I just want to learn about those new AVX-512 instructions. They added a bunch of new encodings (with EVEX prefix) that is hard to understand just by reading the Intel documentation. So idk, it's been always easy for me to first read the docs and then run some tests. So that's my main goal so far.Sastruga
@MikeF: That's exactly what you can do with an emulator, like my answer explains, without having to remote-desktop to a cloud VM to run a debugger there. That's how I learned AVX512. (Actually I spent more time just looking at compiler-generated asm for stuff I tried with intrinsics; I think I only actually ran things in SDE once or twice. Seeing what syntax was accepted by NASM was another way I learned how/when you could use masking and broadcast loads, and rounding-mode overrides.)Infiltration
Yep, that's exactly what I'm trying to learn. Thanks. Although I'm on Windows. Can I use it with Visual Studio, do you know?Sastruga
@MikeF: IDK, read the Intel white papers I linked. They have a Windows section. I assume so, Intel typically cares about Windows at least as much as Linux. But I don't use Windows so I didn't read that part.Infiltration
S
3

There are online tools which allow you to at least select different assembly dialects, but I'm not seeing anything that supports Xeon Phi or Skylake. However, the Intel C++ and Fortran compilers support cross-compiling for those additional architectures. It seems you're using Windows, and that is directly supported.

An additional route would include renting an AWS EC2 C5 instance to play with which natively supports AVX-512. For learning purposes, this can be done for as little as $0.085/hr for a reserved instance or $0.0185/hr if you're fine with Spot pricing.

Scissure answered 12/8, 2018 at 2:26 Comment(10)
Hey, thanks. Your AWS idea sounds very interesting. Although I've never deal with them before. Where do you take all these prices from? And also what is "spot pricing"?Sastruga
Pricing varies over time, but this link should stay up to date. The "spot" instances differ from the "on-demand" instances in that you don't get a machine instantly allocated necessarily. Amazon uses them to fill the gaps in the normal usage and is willing to offer a discount since something is better than nothing (as long as that something exceeds their operating overhead). Your testing likely doesn't require lots of resources or persistent storage between instances on their machines, so the cheapest option should work fine.Scissure
Examining your comment on the other answer, AWS is Amazon, and Azure has a comparable product with AVX-512. Their pricing is competitive -- not outdoing the spot instances but handily beating AWS on-demand products.Scissure
Yep, thanks. I'll try to dig through it. So far it's all very confusing. Let me try to get it straight. I'd rent a VM that I can install, say, Windows on and then remote into it, right? If so, it would be a good idea, as I can run a remote debugger on it with Visual Studio. What confuses me is their naming in that list you linked. Say t1.micro, t2.small, and so on -- million things on that list. Also how do I select which CPU it will run on?Sastruga
Those clouds services are IMO needlessly complex. You'd rent a VM and be able to choose what kind of VM it is (e.g. Windows). You don't have to install the OS. You'd need to dig into the docs to verify the CPU type, or you can take my word for it that Amazon is bragging about AVX512 in the C5 instances and that Microsoft is bragging about it in their Fv2 instances. Both providers use Skylake processors which have the newer version of the AVX512 instruction set. To select which kind of, for example, C5 instance you want you'd need to compare their other properties like RAM. Cheapest should workScissure
They support so many services that documentation takes a little while to wade through till you get used to it. It's to the point where being knowledgeable in AWS is an actual employable skill.Scissure
You bet! Hey, I just noticed that Azure supports free account for 30 days. That may be all I need. Do you think it's worth trying to sign up for that? Or do they run those free accounts on some under-powered CPUs?Sastruga
Usually free accounts are limited to a form of "micro" instance, with the exact terminology varying between cloud providers. Those will typically have enough hours you could run it constantly all month for free. You only pay for your usage anyway (no monthly fees) though, so for debugging and playing with the AVX-512 instruction set you'll probably come in at under a dollar, especially if you're familiar with other SIMD instruction sets.Scissure
Hans, I just finished setting up Azure 30-day free account. Here's what I found. Their standard run-of-the-mill VM with a client version of Win10 was installed on the Intel(R) Xeon(R) CPU E5-2673 v3 which is a Haswell CPU that doesn't support AVX512. So I had to go with their F2s_v2 Standard Compute optimized plan and Win10 Datacenter Server OS that was installed on Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz that supported AVX512F (bit 16) and AVX512VL (bit 31) but did not support AVX512_IFMA (bit 21). I could then remote into it with VS debugger.Sastruga
I'm not sure whether or not they'll let me use it for free for the next 30 days, but aside from having taken several hours to set up this is a way to run my tests on an actual (albeit VM'ed) hardware. Now I'll try reading Intel emulator's white paper that Peter Cordes suggested in another post. Maybe it's an easier solution.Sastruga

© 2022 - 2024 — McMap. All rights reserved.