Use -masm=intel
and don't use any .att_syntax
directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)
That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).
Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst
. But even worse, clang -masm=intel
would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}
" : ... )`!
clang -masm=intel
did still control how it printed asm after its built-in assembler turned an asm()
statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel
turning add %0, 1
as add dword ptr [1], eax
, but clang trunk producing add eax, 1
.
Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.
Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.
If you were hard-coding register names with clang, -masm=intel
would be usable (or the equivalent -mllvm --x86-asm-syntax=intel
). But it chokes on mov %eax, 5
in Intel-syntax mode so you can't let %0
expand to an AT&T-syntax register name.
-masm=intel
makes the compiler use .intel_syntax noprefix
at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax
at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4]
looking like junk to the assembler (which is expecting AT&T syntax).
The "too many operands for mov" is because in AT&T syntax, mov eax, ebx
is a mov
from a memory operand (with symbol name eax
) to a memory operand (with symbol name ebx
)
Some people suggest using .intel_syntax noprefix
and .att_syntax prefix
around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel
.
Problems with the "sandwich" method:
When the compiler substitutes operands into your asm template, it will do so according to -masm=
. This will always break for memory operands (the addressing-mode syntax is completely different).
It will also break with clang even for registers. Clang's built-in assembler does not accept %eax
as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix
(as opposed to the noprefix
that's usually used with Intel-syntax).
Consider this function:
int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}
It assembles as follows with GCC (Godbolt):
movl %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax
The sandwich method depends on GAS accepting %eax
as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.
On a Mac, even using real GCC the asm output has to assemble with an as
that's based on clang, not GNU Binutils.
Using clang on that source code complains:
<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^
(The first line of the error message didn't handle the multi-line string literal very well. If you use ;
instead of \n\t
and put everything on one line the clang error message works better but the source is a mess.)
I didn't check what happens with "ri"
constraints when the compiler picks an immediate; it will still decorate it with $
but IDK if GAS silently ignores that, too, in Intel syntax mode.
PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0
output and the %2
input that you don't read until the 2nd instruction. Then mov
will destroy an input.
But using mov
as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2]
to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add
instruction (using a "+r"
operand and an "r"
, and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov
is needed.
PS: it's possible to write inline asm that works with -masm=intel
or att
, using GNU C inline asm dialect alternatives. e.g.
void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}
compiles with gcc -O2
(-masm=att
is the default) to
atomic_inc(int*):
lock addl $1, (%rdi)
ret
Or with -masm=intel
to:
atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret
Notice that the l
suffix is required for AT&T, and the dword ptr
is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.
This works with clang, but only the AT&T version ever gets used.
-masm=intel
to tell C to generate Intel assembler syntax, but before you end your extended assembler template you tell it to switch to at&t syntax with.att_syntax
. The code generator has no idea you have done this in the template so is still emitting Intel . When passed to GNU assembler everything after.att_syntax
will be Intel but you've told the assembler to treat it at AT&T. Remove.att_syntax
. If you are always using-masm=intel
you need not bother switching to Intel syntax at the beginning of the assembler template. – Pyrargyrite.intel_syntax
and.att_syntax
by usinggcc -masm=intel ./example
works. – Fiftyfiftygcc -S
output, and noticing that compiler-generated intel-syntax instructions came after your.att_syntax
directive. The assembler error message would point you to the right line number. – Microdotasm
as a single instruction that consumes the inputs and produces the outputs. If it's a single instruction, why "waste" one of the very few registers it has that it could be using for something else? To tell the compiler not to do reuse any regs, use "=&r" (aka earlyclobber) for the output constraint. – Gonidium