Catching and debugging invalid use of reference to local variable inside moved lambda
Asked Answered
A

1

6

I've come across an hard-to-debug situation in one of my real projects where I was accidentally accessing a reference to a local variable inside a lambda that had been moved. The access was being done from another thread, but the moved lambda was kept alive until the second thread was finished.

The bug only occurred with optimizations disabled and was caused by careless refactoring.

I've created a minimal example (available here on wandbox) that reproduces the issue:

struct state
{
    int x = 100;
};

template <typename TF>
void eat1(TF&& f)
{
    // Call the lambda.
    f();

    // Simulate waiting for the second thread
    // to finish.
    std::this_thread::sleep_for(1000ms);
}

template <typename TF>
void eat0(TF&& f)
{
    // Move the lambda to some other handler.
    eat1(std::forward<TF>(f));
}

void use_state(state& s)
{
    // Will print `100`.
    std::cout << s.x << "\n";

    // Separate thread. Note that `s` is captured by
    // reference.
    std::thread t{[&s]
        {
            // Simulate computation delay.
            std::this_thread::sleep_for(500ms);

            // Will print garbage.
            std::cout << s.x << "\n";
        }};

    t.detach();
}

int main()
{
    eat0([]
        {
            // Local lambda variable that will be accessed
            // after the lambda is moved.
            state s;

            // Function that takes `s` by reference and
            // accesses it in a separate thread after the
            // lambda is moved.
            use_state(s);
        });
}

Surprisingly, none of the sanitizers and warning flags managed to help here.

I've tried the following combinations of compilers and sanitizers, with

-Wall -Wextra -Wpedantic -g -O0

flags always enabled:

  • Compilers: g++ 6.1.1 on Arch Linux x64; clang++ 3.8.0 on Arch Linux x64; g++ 5.3.1 on Fedora x64; clang++ 3.7.0 on Fedora x64.

  • Sanitizers: -fsanitize=address; -fsanitize=undefined, -fsanitize=thread.

None of the combinations produced any helpful diagnostic. I expected either AddressSanitizer to tell me I was accessing a dangling reference, or UndefinedSanitizer to catch UB while accessing it, or ThreadSanitizer to tell me a separate thread was accessing an invalid memory location.

Is there a reliable way to diagnose this problem? Should I post this example to any of the sanitizers' bug trackers as a feature request/defect?

Aboveground answered 3/9, 2016 at 13:58 Comment(2)
C++ is complicated. There's nothing wrong with capturing by reference. It's part of the language. If the threads' timing is different, this is well-defined. I can't see how it's possible for compiler to deduce runtime threads' timing. You can't rely on the compiler catching everything.Chrischrism
Well I tried the static analysis out of MSVS that uses the CPP Core Guidelines and it did not pick it up either. Either that check has not been implemented yet or there is not a rule covering this case yet. Not sure if you want to raise an issue there to see if this could be diagnosed. linkAutogenesis
R
4

valgrind's memcheck tool caught this problem at default settings. However, this kind of nasty bugs have chances of escaping memcheck. I am not sure that the problem would be caught on the real program.

The fact that the first lambda was moved is not relevant to the problem (though maybe it complicated the debugging process). The problem is due to accessing a local variable in a function that has finished its execution (again, the fact that the access happened from a different thread just made the investigation more difficult but didn't contribute to the bug in any other way). The fact that the first lambda was kept alive should by no means protect you - the local variables belong to the lambda invocation and not the lambda itself.

Rolandrolanda answered 3/9, 2016 at 14:48 Comment(1)
I ran this through valgrind also. valgrind did not complain, not a peep.Chrischrism

© 2022 - 2024 — McMap. All rights reserved.