I've been reading on compiler optimizations vs CPU optimizations, and volatile
vs memory barriers.
One thing which isn't clear to me is that my current understanding is that CPU optimizations and compiler optimizations are orthogonal. I.e. can occur independently of each other.
However, the article volatile considered harmful makes the point that volatile
should not be used. Linus's post makes similar claims. The main reasoning, IIUC, is that marking a variable as volatile
disables all compiler optimizations when accessing that variable (i.e. even if they are not harmful), while still not providing protection against memory reorderings. Essentially, the main point is that it's not the data that should be handled with care, but rather a particular access pattern needs to be handled with care.
Now, the volatile considered harmful article gives the following example of a busy loop waiting for a flag:
while (my_variable != what_i_want) {}
and makes the point that the compiler can optimize the access to my_variable
so that it only occurs once and not in a loop. The solution, so the article claims, is the following:
while (my_variable != what_i_want)
cpu_relax();
It is said that cpu_relax
acts as a compiler barrier (earlier versions of the article said that it's a memory barrier).
I have several gaps here:
1) Is the implication that gcc has special knowledge of the cpu_relax
call, and that it translates to a hint to both the compiler and the CPU?
2) Is the same true for other instructions such as smb_mb()
and the likes?
3) How does that work, given that cpu_relax
is essentially defined as a C macro? If I manually expand cpu_relax
will gcc still respect it as a compiler barrier? How can I know which calls are respected by gcc?
4) What is the scope of cpu_relax
as far as gcc is concerned? In other words, what's the scope of reads that cannot be optimized by gcc when it sees the cpu_relax
instruction? From the CPU's perspective, the scope is wide (memory barriers place a mark in the read or write buffer). I would guess gcc uses a smaller scope - perhaps the C scope?
cpu_relax
in a busy loop), the document I linked to claims that this also causes gcc to treat the memory as volatile - i.e. not cache it in a register. How can I know what scope this applies to? I would assume that this doesn't disable register-caching across the entire function or compilation unit, but how can I know? – Inattentive