That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)
The simple option is asm volatile
and a "memory"
clobber1.
The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.
Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr)
. Or of course "+m"
with the same syntax.
That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)
If you leave the size unspecified with []
, that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10]
or [some_variable]
, that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1]
is not part of the input.
GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char
.
(from the GCC manual)
An x86 example where the string memory argument is of unknown length.
asm("repne scasb"
: "=c" (count), "+D" (p)
: "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.
But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.
Background:
This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.
GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m"
or "=m"
operand constraint, and it picks the addressing mode.
Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile
plus a "memory"
clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.
See also Looping over arrays with inline assembly for using an asm
statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m"
and "=m"
operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.
Footnote 1: A "memory"
clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm
statement with a "memory"
clobber.
asm volatile
is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).
Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile
, the asm statement could be CSEd out of a loop. (A "memory"
clobber does not make the optimizer treat all memory as an input when considering whether the asm
statement even needs to run.)
asm
with no output operands is implicitly volatile
, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).
e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory")
has an output operand so is not implicitly volatile. If you used it like
arr[5] = 1;
total += asm_sum(arr, len);
memcpy(arr, foo, len);
total += asm_sum(arr, len);
Without volatile
the 2nd asm_sum
could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile
for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory"
clobber will have the desired effect of requiring memory to be in sync.
__asm__ volatile ("nop":"=r"(iptr)::)
, it seems that it works. – Dwight"=r"
means that the pointer may change, so the compiler cannot assume thatiptr
is the same after the inlined asm statement. That's why it cannot optimize away the first write toiptr
. It also does not constraint writes to other pointers. It appears to me that this is the desired behavior. – Dwight=
meaning that the asm might change the pointer value, sogcc
has to make both writes (since they might write to different locations). However, it doesn't mean gcc has to do the write before calling the inline asm (although it happens to in this case) and so it doesn't work in general (you can construct a similar example where it fails). – Exarchate"+r"
? Using"=r"
doesn't even require the compiler to pass the pointer value into the asm, I think? – Exarchate