For testing purposes, I am writing short assembly snippets for Intel's Xeon Phi with the Icc inline assembler. Now I wanted to use masked vector instructions, but I fail at feeding them to the inline assembler.
For code like this:
vmovapd -64(%%r14, %%r10), %%zmm0{%%k1}
I get the error message
/tmp/icpc5115IWas_.s: Assembler messages:
/tmp/icpc5115IWas_.s:563: Error: junk `%k1' after register
I tried a lot of different combinations, but nothing worked. The compiler version is intel64/13.1up03 under Linux, using GAS syntax.
Edit: The code above actually works with non-extended assembler. So this:
__asm__("vmovapd -64(%r14, %r10), %zmm0{%k1} ")
works, while the following does not:
__asm__("vmovapd -64(%[src], %%r10), %%zmm0{%%k1} "
:
: [src]"r"(src)
:)
I guess it has something to do with the necessity to use a double % before register names in extended mode. But no, a single % for the k does not work either.
add {%0, %1 | %1, %0}
to write code that works with either AT&T or Intel, so you can compile it with or without-masm=intel
. – Einhorn