I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138).
As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute nodes only support either Intel AVX2 (Haswell) or Intel AVX-512 (Skylake)
If I compile with the option -xHost
on the login node, it should automatically use the highest instruction set available. But which one is the highest? And how can I ensure, that my program runs on both compute-systems with best performance? Do I have to compile two versions?
Bonus question: Which -march
do I have to specify in this case?
-march=skylake-avx512
. Otherwise you can only use at most-march=haswell
as a baseline, with AVX512 only via runtime CPU detection. Or yeah if you can compile separate versions for each node, do that. (If you have any tasks that don't benefit from AVX512, let them run on the haswell nodes.) – Knurly