Where is the race in this thread sanitzer warning?
Asked Answered
S

1

8

The below code produce a warning when running with thread sanitizer on macOS. I can't see where the race is. The control block of shared_ptr and weak_ptr is thread safe, and pushing and popping from the std::queue is done with a lock held.

#include <future>
#include <memory>
#include <queue>

class Foo {
public:
  Foo() {
    fut = std::async(std::launch::async, [this] {
      while (!shouldStop) {
        std::scoped_lock lock(mut);
        while (!requests.empty()) {
          std::weak_ptr<float> requestData = requests.front();
          requests.pop();
          (void)requestData;
        }
      }
    });
  }

  ~Foo() {
    shouldStop.store(true);
    fut.get();
  }

  void add(const std::weak_ptr<float> subscriber) {
    std::scoped_lock lock(mut);
    requests.push(subscriber);
  }

private:
  std::atomic<bool> shouldStop = false;
  std::future<void> fut;
  std::queue<std::weak_ptr<float>> requests;
  std::mutex mut;
};

int main() {
  Foo foo;

  int numIterations = 100000;

  while (--numIterations) {
    auto subscriber = std::make_shared<float>();

    foo.add(subscriber);
    subscriber.reset();
  }

  std::this_thread::sleep_for(std::chrono::seconds(1));
}

Warning with stacktrace:

WARNING: ThreadSanitizer: data race (pid=11176)
  Write of size 8 at 0x7b0800000368 by thread T1 (mutexes: write M16):
    #0 operator delete(void*) <null>:1062032 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x4f225)
    #1 std::__1::__shared_ptr_emplace<float, std::__1::allocator<float> >::__on_zero_shared_weak() new:272 (minimal:x86_64+0x1000113de)
    #2 std::__1::weak_ptr<float>::~weak_ptr() memory:5148 (minimal:x86_64+0x100010762)
    #3 std::__1::weak_ptr<float>::~weak_ptr() memory:5146 (minimal:x86_64+0x100002448)
    #4 Foo::Foo()::'lambda'()::operator()() const minimal_race.cpp:15 (minimal:x86_64+0x10000576e)
    #5 void std::__1::__async_func<Foo::Foo()::'lambda'()>::__execute<>(std::__1::__tuple_indices<>) type_traits:4345 (minimal:x86_64+0x1000052f0)
    #6 std::__1::__async_func<Foo::Foo()::'lambda'()>::operator()() future:2323 (minimal:x86_64+0x100005268)
    #7 std::__1::__async_assoc_state<void, std::__1::__async_func<Foo::Foo()::'lambda'()> >::__execute() future:1040 (minimal:x86_64+0x100005119)
    #8 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<Foo::Foo()::'lambda'()> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<Foo::Foo()::'lambda'()> >*> >(void*) type_traits:4286 (minimal:x86_64+0x10000717c)

  Previous atomic write of size 8 at 0x7b0800000368 by main thread:
    #0 __tsan_atomic64_fetch_add <null>:1062032 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x24cdd)
    #1 std::__1::shared_ptr<float>::~shared_ptr() memory:3472 (minimal:x86_64+0x1000114d4)
    #2 std::__1::shared_ptr<float>::~shared_ptr() memory:4502 (minimal:x86_64+0x100002488)
    #3 main memory:4639 (minimal:x86_64+0x10000210b)

SUMMARY: ThreadSanitizer: data race new:272 in std::__1::__shared_ptr_emplace<float, std::__1::allocator<float> >::__on_zero_shared_weak()

edit: I compile it with:

clang++ -std=c++17 -g -fsanitize=thread -o test  minimal_race.cpp

clang version:

$ clang++ --version
clang version 7.1.0 (tags/RELEASE_710/final)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm@7/bin

I am using macOS 10.14.6

Sharecropper answered 18/11, 2019 at 11:42 Comment(17)
I think this belongs in codereviewAngers
No it's fine here. There's a problem with OP's code, they're not just looking to improve it. A race condition is a very serious problemHumor
tried it a couple of times with thread sanitizer with gcc and clang on linux and could not reproduce the issueAngers
Wouldn't this cause a potential deadlock? With std::launch::async, it's up to std::async to determine how to schedule your requests according to cppreference.com. This means that potentially when Foo is constructed, the future locks the mutex (which it doesn't unlock until shouldStop is true). If Foo::Add is then called, it will try to lock the mutex, waiting for the future to unlock it, which it never does.Clipper
@Clipper it does, actually. look betterOmmatophore
@Ommatophore ah right, it does, my badClipper
managed to get a better trace with libc++ on clang on linux ... don't know how to post it here so you can see it in this link instead compiled with clang++ -std=c++17 -O0 -ggdb -fsanitize=thread -stdlib=libc++ -o x x.cpp, does not happen with libstdc++,Angers
I've run it on MacOS 10.14.6 with Apple clang version 11.0.0 (clang-1100.0.33.12), built like this: clang++ -std=c++17 -g -fsanitize=thread sanitizerWarning.cpp -o sanitizerWarning and running code doesn't produce any sanitizer warnings. Same with clang++ -std=c++17 -O2 -fsanitize=thread sanitizerWarning.cpp -o sanitizerWarning. So voting to close as can't reproduce.Denadenae
add -stdlib=libc++ @MarekRAngers
@Angers no changes.Denadenae
Maybe this is false positive with older version of clang? @Sharecropper provide OS version clang version and build flags.Denadenae
I'm confused... you make a shared_ptr that you put on the queue of non-owning weak_ptr and then immediately reset it.. after that it's destroyed... Trying to get a nullptr on the queue? Resetting/destroying something that is used in another thread without any mutex locking sounds to me like a potential race condition... The race seems to occur in the destructor of shared_ptrSteamroller
Yes you are right, this is weird (bad logic), but this should not trigger detection of race condition. On my machine it thread sanitizer doesn't find anything.Denadenae
I edited the post to include build flags, version of clang and os version.Sharecropper
hi, did you try my answer and does it fix things? if not could you post the trace you get from it?Angers
I am almost sure that the suggestion using std::shared_ptr in all places will get rid of the warning since the object will not be deleted directly after pushing it on the queue. The example above is very minimal and I have removed all things to keep it as small as possible (like getting a shared_ptr from calling lock() on the weak_ptr after it is popped from the queue)Sharecropper
as I said, I do believe it to be a bug, from what I can tell it should not happenAngers
A
1

I think this might be a bug with libc++ and clang in their weak_ptr/shared_ptr implementation...

changing the weak_ptr queue to a shared_ptr one fixes the problem even with old clang versions.

changes are line 25:

  void add(const std::shared_ptr<float> subscriber) {

line 33:

  std::queue<std::shared_ptr<float>> requests;

and rewriting the while at line 42 as:

  while (--numIterations) {
    auto subscriber = std::make_shared<float>();

    foo.add(move(subscriber));
  }
Angers answered 18/11, 2019 at 12:19 Comment(5)
have you been able to reproduce it? Which clang version?Denadenae
to me this sounds logical, as this prevents the objects from being destroyed. The race condition seems to occur in the destructor of shared_ptr.Steamroller
clang++ -v clang version 9.0.0 (tags/RELEASE_900/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/lib/llvm/9/bin Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Selected multilib: .;@m64 @MarekRAngers
@MarekR I even posted my own backtrace with a link to it in a comment on the question and build flagsAngers
@Steamroller if I understand correctly how weak_ptr/shared_ptr interactions should work using them from different threads should be safe (not the pointed data, just the wrapper classes) at least in how the OP uses them with a shared_ptr and a weak_ptr (being different objects)Angers

© 2022 - 2024 — McMap. All rights reserved.