OpenMP atomic and non-atomic reads/writes produce the same instructions on x86_64
Asked Answered
C

1

6

According to the OpenMP Specification (v4.0), the following program contains a possible data race due to unsynchronized read/write of i:

int i{0}; // std::atomic<int> i{0};

void write() {
// #pragma omp atomic write // seq_cst
   i = 1;
}

int read() {
   int j;
// #pragma omp atomic read // seq_cst
   j = i; 
   return j;
}

int main() {
   #pragma omp parallel
   { /* code that calls both write() and read() */ }
}

Possible solutions that came to my mind are shown in the code as comments:

  1. to protect write and read of i with #pragma omp atomic write/read,
  2. to protect write and read of i with #pragma omp atomic write/read seq_cst,
  3. to use std::atomic<int> instead of int as a type of i.

Here are the compilers-generated instructions on x86_64 (with -O2 in all cases):

GNU g++ 4.9.2:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
// #pragma omp atomic seq_cst:  MOV           MOV
#pragma omp atomic seq_cst:  MOV+MFENCE    MOV    (see UPDATE)
std::atomic<int>:            MOV+MFENCE    MOV

clang++ 3.5.0:               i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          MOV           MOV
#pragma omp atomic seq_cst:  MOV           MOV
std::atomic<int>:            XCHG          MOV

Intel icpc 16.0.1:           i = 1;        j = i;
original code:               MOV           MOV
#pragma omp atomic:          *             *
#pragma omp atomic seq_cst:  *             *
std::atomic<int>:            XCHG          MOV

* Multiple instructions with calls to __kmpc_atomic_xxx functions.

What I wonder is why the GNU/clang compiler does not generate any special instructions for #pragma omp atomic writes. I would expect similar instructions as for std::atomic, i.e, either MOV+MFENCE or XCHG. Any explanation?

UPDATE

g++ 5.3.0 produces MFENCE for #pragma omp atomic write seq_cst. That is the correct behavior, I believe. Without seq_cst, it produces plain MOV, which is sufficient for non-SC atomicity.

There was a bug in my Makefile, g++ 4.9.2 produces MFENCE for CS atomic write as well. Sorry guys for that.

Clang 3.5.0 does not implement the OpenMP SC atomics, thanks Hristo Iliev for pointing this out.

Carabin answered 17/2, 2016 at 16:32 Comment(6)
My GCC 4.9.2 generates an mfence immediately after movl $1, i(%rip) for the sequentially consistent atomic write.Frayda
Also, Clang 3.5.0 only supports the regular non-sequentially consistent atomics. It doesn't even have a full OpenMP 3.1 support - see here.Frayda
Your GCC 4.9.2 generates mfence for OpenMP SC atomic write? That is, with i being of type int? My GCC only for std::atomic<int>.Carabin
I just realized that g++ 5.3.0 produces mfence for SC OpenMP atomic writes. So, the problem was with (my) g++ 4.9.2.Carabin
I wonder how your 4.9.2 is different from my 4.9.2. I doubt that the machine specification in GCC could be different. What OS and distribution are you using?Frayda
@Hristo Iliev: You are absolutely right, I had a bug in my Makefile, a wrong source was used :(. Guys, I am so sorry about that. Thanks for you time and help.Carabin
R
1

There are two possibilities.

  1. The compiler is not obligated to convert C++ code containing a data race into bad machine code. Depending on the machine memory model, the instructions normally used may already be atomic and coherent. Take that same C++ code to another architecture and you may start seeing the pragmas cause differences that didn't exist on x86_64.

  2. In addition to potentially causing use of different instructions and/or extra memory fence instructions, the atomic pragmas (as well std::atomic and volatile) also constrain the compiler's own code reordering optimizations. They may not apply to your simply case, but you certainly could see that common-subexpression elimination, including hoisting computations outside a loop, may be affected.

Ravishment answered 17/2, 2016 at 16:39 Comment(7)
I agree, however, MOV alone should not be enough for sequentially consistent atomic store (see, e.g., here, or Herb Sutter's lecture around 0:35:00). Therefore, I would expect XCHG or MFENCE for #pragma omp atomic write seq_cst.Carabin
However, assuming that the data is correctly aligned, mov is sufficient for the simple atomic (without seq_cst), since in X86 tearing cannot occur. (All of the bytes making up the value are written atomically by mov). Without the seq_cst the atomic construct does not also imply an OpenMP "flush".Acetamide
@Jim Cownie: Are your sure about flush? From OpenMP Spec. 4.0: A flush region with a list is implied at the following locations: At entry to and exit from the atomic operation performed in a non-sequentially consistent atomic region, where the list contains only the storage location designated as x according to the description of the syntax of the atomic construct in Section 2.12.6 on page 127. And: A flush region without a list is implied at the following locations: ... At entry to and exit from the atomic operation performed in a sequentially consistent atomic region.Carabin
@Daniel: See the 4.5 spec, which says: "Any atomic construct with a seq_cst clause forces the atomically performed operation to 17 include an implicit flush operation without a list." Since it calls out that case, I assume that that also implies that a construct without a seq_cst does not imply a flush. It also explicitly (though in non-normative text) says that "a non-sequentially consistent atomic construct has the same semantics as a memory_order_relaxed atomic operation in C++11/C11". (And, the text you quote only requires flushing the variable in question anyway, which a simple mov does.)Acetamide
@Jim Cownie: The text I quoted is in 4.5 spec. as well, thus even non SC atomics imply flush, though flush with a list. And you are right that mov provides this. For SC atomics, I would expect additional fence to prevent reordering at CPU level, but it's not put in the program by g++ in my case. That is what I wodner about; does sequential consistency mean something different for OpenMP and C++11 memory models?Carabin
@Daniel. I think we're in violent agreement. The non-SC read/write operations don't need any memory fences on X86. The SC ones do. (And we were being confused by a gcc bug :-( ).Acetamide
@Jim I think so as well :). Thanks for helpCarabin

© 2022 - 2024 — McMap. All rights reserved.