C++11 (g++ thread sanitized) Ordering nonatomic operations with atomics (false positive?)

Asked 15/8, 2016 at 9:43 Answered 23/3, 2017 at 16:48

Solved c++multithreading c++11 thread-sanitizer

I am experimenting with g++ and thread sanitizer and I think I am getting false positives. Is this true, or am I making some big mistake?

Program (cut&paste from Anthony Williams: C++ Concurrency in Action, page 145, listing 5.13)

#include <atomic>
#include <thread>
#include <assert.h>
bool x=false;
std::atomic<bool> y;
std::atomic<int> z;
void write_x_then_y()
{
  x=true;
  std::atomic_thread_fence(std::memory_order_release);
  y.store(true,std::memory_order_relaxed);
}
void read_y_then_x()
{
  while(!y.load(std::memory_order_relaxed));
  std::atomic_thread_fence(std::memory_order_acquire);
  if(x)
    ++z;
}
int main()
{
  x=false;
  y=false;
  z=0;
  std::thread a(write_x_then_y);
  std::thread b(read_y_then_x);
  a.join();
  b.join();
  assert(z.load()!=0);
}

Compiled with:

g++ -o a -g -Og -pthread a.cpp -fsanitize=thread

g++ version

~/build/px> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.1.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 6.1.1 20160621 (Red Hat 6.1.1-3) (GCC)

I am getting:

~/build/px> ./a
==================
WARNING: ThreadSanitizer: data race (pid=13794)
  Read of size 1 at 0x000000602151 by thread T2:
    #0 read_y_then_x() /home/ostri/build/px/a.cpp:17 (a+0x000000401014)
    #1 void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) /usr/include/c++/6.1.1/functional:1400 (a+0x000000401179)
    #2 std::_Bind_simple<void (*())()>::operator()() /usr/include/c++/6.1.1/functional:1389 (a+0x000000401179)
    #3 std::thread::_State_impl<std::_Bind_simple<void (*())()> >::_M_run() /usr/include/c++/6.1.1/thread:196 (a+0x000000401179)
    #4 <null> <null> (libstdc++.so.6+0x0000000baaae)

  Previous write of size 1 at 0x000000602151 by thread T1:
    #0 write_x_then_y() /home/ostri/build/px/a.cpp:9 (a+0x000000400fbd)
    #1 void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) /usr/include/c++/6.1.1/functional:1400 (a+0x000000401179)
    #2 std::_Bind_simple<void (*())()>::operator()() /usr/include/c++/6.1.1/functional:1389 (a+0x000000401179)
    #3 std::thread::_State_impl<std::_Bind_simple<void (*())()> >::_M_run() /usr/include/c++/6.1.1/thread:196 (a+0x000000401179)
    #4 <null> <null> (libstdc++.so.6+0x0000000baaae)

  Location is global 'x' of size 1 at 0x000000602151 (a+0x000000602151)

  Thread T2 (tid=13797, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x000000028380)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0x0000000badc4)
    #2 main /home/ostri/build/px/a.cpp:26 (a+0x000000401097)

  Thread T1 (tid=13796, finished) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x000000028380)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0x0000000badc4)
    #2 main /home/ostri/build/px/a.cpp:25 (a+0x00000040108a)

SUMMARY: ThreadSanitizer: data race /home/ostri/build/px/a.cpp:17 in read_y_then_x()
==================
ThreadSanitizer: reported 1 warnings

I got this warning in more complex program, and I've thought it is my bug, but now even a "school book program" displays the same behaviour. Is it (i.e. some compiler switch is missing) me or g++?

UPDATED Taken from link

#if defined(__SANITIZE_THREAD__)
#define TSAN_ENABLED
#elif defined(__has_feature)
#if __has_feature(thread_sanitizer)
#define TSAN_ENABLED
#endif
#endif

#ifdef TSAN_ENABLED
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr) \
    AnnotateHappensBefore(__FILE__, __LINE__, (void*)(addr))
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr) \
    AnnotateHappensAfter(__FILE__, __LINE__, (void*)(addr))
extern "C" void AnnotateHappensBefore(const char* f, int l, void* addr);
extern "C" void AnnotateHappensAfter(const char* f, int l, void* addr);
#else
#define TSAN_ANNOTATE_HAPPENS_BEFORE(addr)
#define TSAN_ANNOTATE_HAPPENS_AFTER(addr)
#endif

#include <atomic>
#include <thread>
#include <assert.h>
bool x=false;
std::atomic<bool> y;
std::atomic<int> z;
void write_x_then_y()
{
  x=true;
  std::atomic_thread_fence(std::memory_order_release);
  TSAN_ANNOTATE_HAPPENS_BEFORE(&x);
  y.store(true,std::memory_order_relaxed);
}
void read_y_then_x()
{
  while(!y.load(std::memory_order_relaxed));
  std::atomic_thread_fence(std::memory_order_acquire);
  TSAN_ANNOTATE_HAPPENS_AFTER(&x);
  if(x)
    ++z;
}
{
  x=false;
  y=false;
  z=0;
  std::thread a(write_x_then_y);
  std::thread b(read_y_then_x);
  a.join();
  b.join();
  assert(z.load()!=0);
}

Compile command

g++ -o a -g -Og -pthread a.cpp -fsanitize=thread -D__SANITIZE_THREAD__

Basile answered 15/8, 2016 at 9:43 Comment(3)

Isn't your release fence in wrong place? – Dredger 15/8, 2016 at 10:5

I've changed the book reference. Now it points to the right example. The example is cut&paste. If the fences are in the wrong place they are in the wrong place in the book also. ;-) Where would you put the release fence? – Basile 15/8, 2016 at 16:12

I think I found the answer in link – Basile 15/8, 2016 at 16:23

TL;DR: This is a TSAN false positive. The code is valid.

Thread 1:

  x=true;                                              // W
  std::atomic_thread_fence(std::memory_order_release); // A
  y.store(true,std::memory_order_relaxed);             // X

Thread 2:

  while(!y.load(std::memory_order_relaxed));           // Y
  std::atomic_thread_fence(std::memory_order_acquire); // B
  if(x)                                                // R
     ++z;

[atomics.fences]/2:

A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

Let's go through the list:

[✔] "there exist atomic operations X and Y, both operating on some atomic object M": obvious. M is y.
[✔] "A is sequenced before X": obvious ([intro.execution]/14 for those who want a citation).
[✔] "X modifies M": obvious.
[✔] "Y is sequenced before B" : obvious.
[✔] "and Y reads the value written by X...": that's the only way that loop could terminate.

Therefore, the release fence A synchronizes with the acquire fence B.

The write W is sequenced before A, and the read R is sequenced after B, therefore W inter-thread happens before, and so happens before, R. [intro.races]/9-10:

An evaluation A inter-thread happens before an evaluation B if

A synchronizes with B, or

A is dependency-ordered before B, or

for some evaluation X

A synchronizes with X and X is sequenced before B, or

A is sequenced before X and X inter-thread happens before B, or

A inter-thread happens before X and X inter-thread happens before B.

An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:

A is sequenced before B, or

A inter-thread happens before B.

There is no data race because of the happens-before relationship ([intro.races]/19):

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.

Moreover, the read R is guaranteed to read the value written by W, because W is the visible side effect, there being no other side effect on x after the threads started ([intro.races]/11):

A visible side effect A on a scalar object or bit-field M with respect to a value computation B of M satisfies the conditions:

A happens before B and

there is no other side effect X to M such that A happens before X and X happens before B.

The value of a non-atomic scalar object or bit-field M, as determined by evaluation B, shall be the value stored by the visible side effect A.

Erethism answered 15/8, 2016 at 21:4 Comment(1)

detailed explanation – Basile 3/9, 2016 at 7:0

memory_order_relaxed imposes no restraints on reordering.

memory_order_acquire doesn't prevent reordering past the fence from above. It only prevents ordering from below. This means that the code can be executed as if:

std::atomic_thread_fence(std::memory_order_acquire);
if(x)
  ++z;
while(!y.load(std::memory_order_relaxed));

This will cause a data race as the read in if(x) races with x=true.

You need fences with memory_order_acq_rel or memory_order_seq_cst semantics in both functions which prevents reordering in both directions.

Theseus answered 15/8, 2016 at 10:29 Comment(13)

A bit inaccurate. An operation with acquire semantics prevents moving subsequent loads prior to it. An operation with release semantics prevents moving preceding stores past it. – Supposal 15/8, 2016 at 16:32

@MaximEgorushkin That would not be correct according to my understanding. Here is an argument from authority: preshing.com/20120913/acquire-and-release-semantics that argues that any operation, not just load or store, cannot float across a barrier. I would love to read an argument that supports your claim. – Theseus 15/8, 2016 at 16:54

What Preshing says disagrees with the C++ standard. I would say Preshing underspecified the effects. – Supposal 15/8, 2016 at 17:5

@MaximEgorushkin That is not the C++ standard. Please post the exact reference to the paragraph in the standard and the standard version. – Theseus 15/8, 2016 at 17:9

@MaximEgorushkin I cannot find citations from the link you posted in the actual C++ standard. – Theseus 15/8, 2016 at 17:15

Can you find them in Preshing's blog post? – Supposal 15/8, 2016 at 17:20

@MaximEgorushkin Therefore you don't have an argument backing your claim. As I have said, my argument is from authority. Here is another expert: channel9.msdn.com/Shows/Going+Deep/… saying that fences prevent read and write orderings. – Theseus 15/8, 2016 at 17:25

Sutter's video actually gets them right, I was about to recommend watching it. In full though. – Supposal 15/8, 2016 at 17:31

As with argument is from authority, I find C++ reference more authoritative than Preshing. – Supposal 15/8, 2016 at 17:32

@MaximEgorushkin Where in the talk does Herb support you claim? – Theseus 15/8, 2016 at 17:33

I suggest you a file a bug report to C++ reference if you think it is wrong. – Supposal 15/8, 2016 at 17:34

@MaximEgorushkin While this discrepancy is interesting, it doesn't influence the example OP cited. Either way, the example is causing a data race. – Theseus 15/8, 2016 at 17:59

@MaximEgorushkin But see Herb Sutter's comments under Pershing's post. Also we are talking about fences here, which are subtly different from plain store-release on atomic objects (which is what cppreference is aiming at, I think). Personally, I find it easier to just reason about whether there is a happens-before relationship than reordering. – Erethism 15/8, 2016 at 20:45

Unfortunately ThreadSanitizer cannot comprehend memory fences. This is because it reasons in the terms of happens-before relation between accesses to certain objects, and there's no object in the fence operation.

If you replace the relaxed load + acquire fence with an acquire load, and the release fence + relaxed store with a release store, TSan would correctly detect the happens-before relation between the store and the load.

Also note GCC's TSan implementation may fail to instrument atomics at O0 (see https://mcmap.net/q/511780/-why-does-threadsanitizer-report-a-race-with-this-lock-free-example).

Deanedeaner answered 23/3, 2017 at 16:48 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags