How can I verify lock-free algorithms?
Asked Answered
W

5

29

In theory, it should be possible to at least brute force a verification of a lock-free algorithm (there are only so many combinations of function calls intersecting). Are there any tools or formal reasoning processes available to actually prove that a lock-free algorithm is correct (ideally it should also be able to check for race conditions and the ABA problem as well)?

Note: If you know a way to just prove one point (e.g. only prove that it is safe from the ABA problem) or a problem I haven't mentioned then post the solution anyway. In the worst case scenario, each method can be done in turn to fully verify it.

Waterresistant answered 15/1, 2010 at 13:43 Comment(6)
formally through a state transition graph.Beauty
There's no hardware support to do such a check, a check like this can only be done if running instructions on multiple CPUs can somehow be synchronized, so that the race is achieved deterministically. Current hardware can't do this.Vizza
@Pop: you don't need special hw to prove an algorithm won't lock.Beauty
Publish the algorithm and let as many people look at it as you can. That doesn't prove the correctness but hopefully the N-th set of eyes will see some corner case you didn't think of.Mommy
I agree - peer-review is a god-send!Ferroelectric
@Pop i think virtual machine with record-replay feature can help in achieving race condition deterministically. though it's not fully developed yet... but i guess it's available as open-source now....Earl
P
23

You should definitely try the Spin model checker.

You write a program-like model in a simple C-like language called Promela, which Spin internally translates into a state machine. A model can contain multiple parallel processes.

What Spin then does is check every possible interleaving of instructions from each process for whatever conditions you want to test -- typically, absence of race conditions, freedom from deadlocks etc. Most of these tests can be easily written using assert() statements. If there is any possible execution sequence that violates an assertion, the sequence is printed out, otherwise you are given the "all-clear".

(Well, in actual fact it uses a much fancier and faster algorithm to accomplish this, but that is the effect. By default, all reachable program states are checked.)

This is an incredible program, it won the 2001 ACM System Software Award (other winners include Unix, Postscript, Apache, TeX). I got started using it very quickly, and in a couple of days was able to implement models of the MPI functions MPI_Isend() and MPI_Irecv() in Promela. Spin found a couple of tricky race conditions in one segment of parallel code I converted across to Promela for testing.

Palmitin answered 15/1, 2010 at 14:5 Comment(6)
Spin works well. Note, however, that although the learning curve is much less steep than using, say, PVS or Isabelle (theorem provers), it's still pretty tricky. To really use SPIN you often need to reduce the state space; and that means adding assumptions that limit verification, and you also need to know what to look for in the first place. Still, spin's a very solid and relatively easy tool, as long as you don't expect any magic.Dracaena
Oh, and don't forget memory barriers; spin (AFAIK) assumes all writes are atomic and instantly visible, you'll need to reason about memory barriers separately (which is generally necessary for low-or-no-lock techniques)Dracaena
@Eamon: Absolutely, there is a learning curve and I found I needed to seriously abstract/downsize some things to come up with feasible models. Re memory barriers: you might be able to use channels to model this -- instead of a global var, have a process to which you "send" read and write requests via a channel and "receive" all reads via another channel.Palmitin
This looks exactly like what I've been looking for. Been doing lock-free programing for almost a year now, just haven't had a decent way to prove my algorithms work (though we haven't had any problems with the ones I've been confident to place into production code).Waterresistant
@Grant: Another thing to bear in mind is that Spin can only prove correctness for finite problem sizes (e.g. 1 producer, 3 consumers). But OTOH, it's sometimes possible to prove mathematically that e.g. "if it works for 3 it will work for any number". The original Spin paper mentions a case where reasoning like this, plus Spin, enabled proof of correctness of a network protocol.Palmitin
@ j_random_hacker: Yeah, I assumed as much (NP problem and all), luckily I currently work on game consoles only, so I know the maximum number of physical cores (i.e. maximum number of consumers/produces == number of cores, we never have more than 1 thread per core except on single core systems in which case there are 2).Waterresistant
A
9

Spin is indeed excellent, but also consider Relacy Race Detector by Dmitriy V'jukov. It is purpose-built for verifying concurrent algorithms including non-blocking (wait-/lock-free) algorithms. It's open source and liberally licensed.

Relacy provides POSIX and Windows synchronization primitives (mutexes, condition variables, semaphores, CriticalSections, win32 events, Interlocked*, etc), so your actual C++ implementation can be fed to Relacy for verification. No need to develop a separate model of your algorithm as with Promela and Spin.

Relacy provides C++0x std::atomic (explicit memory ordering for the win!) so you can use pre-processor #defines to select between Relacy's implementation and your own platform specific atomic implementation (tbb::atomic, boost::atomic, etc).

Scheduling is controllable: random, context-bound, and full search (all possible interleavings) available.

Here's an example Relacy program. A few things to note:

  • The $ is a Relacy macro that records execution information.
  • rl::var<T> flags "normal" (non-atomic) variables that also need to be considered as part of the verification.

The code:

#include <relacy/relacy_std.hpp>

// template parameter '2' is number of threads
struct race_test : rl::test_suite<race_test, 2>
{
    std::atomic<int> a;
    rl::var<int> x;

    // executed in single thread before main thread function
    void before()
    {
        a($) = 0;
        x($) = 0;
    }

    // main thread function
    void thread(unsigned thread_index)
    {
        if (0 == thread_index)
        {
            x($) = 1;
            a($).store(1, rl::memory_order_relaxed);
        }
        else
        {
            if (1 == a($).load(rl::memory_order_relaxed))
                x($) = 2;
        }
    }

    // executed in single thread after main thread function
    void after()
    {
    }

    // executed in single thread after every 'visible' action in main threads
    // disallowed to modify any state
    void invariant()
    {
    }
};

int main()
{
    rl::simulate<race_test>();
}

Compile with your normal compiler (Relacy is header-only) and run the executable:

struct race_test
DATA RACE
iteration: 8

execution history:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

thread 0:
[0] 0:  atomic store, value=0, (prev value=0), order=seq_cst, in race_test::before, test.cpp(14)
[1] 0:  store, value=0, in race_test::before, test.cpp(15)
[2] 0:  store, value=1, in race_test::thread, test.cpp(23)
[3] 0:  atomic store, value=1, (prev value=0), order=relaxed, in race_test::thread, test.cpp(24)

thread 1:
[4] 1:  atomic load, value=1, order=relaxed, in race_test::thread, test.cpp(28)
[5] 1:  store, value=0, in race_test::thread, test.cpp(29)
[6] 1: data race detected, in race_test::thread, test.cpp(29)

Recent versions of Relacy also provide Java and CLI memory models if you're into that sort of thing.

Audieaudience answered 2/2, 2010 at 11:29 Comment(2)
that looks great, I'll have to give it a try as wellWaterresistant
Looks very nice! Seems it has now moved to 1024cores.net/home/relacy-race-detector BTW.Palmitin
F
4

I don't know what platform or language you're using - but on the .Net platform there is a Microsoft Research project called Chess which is looking very promising at helping those of us doing multithreaded components - including lock-free.

I've not used it a huge amount, mind.

It works (crude explanation) by explicitly interleaving threads in the tightest possible ways to actually force your bugs out into the wild. It also analyses code to find common mistakes and bad patterns - similar to code analysis.

In the past, I've also built special versions of the code in question (through #if blocks etc) that add extra state-tracking information; counts, versions etc that I can then dip into within a unit test.

The problem with that is that it takes a lot more time to write your code, and you can't always add this kind of stuff without changing the underlying structure of the code that's already there.

Ferroelectric answered 15/1, 2010 at 13:46 Comment(6)
+1, very interesting, especially how non-intrusive it is. But this comment: channel9.msdn.com/shows/Going+Deep/… suggests that it doesn't (yet) consider pre-emptions between arbitrary instructions. So the fewer locks/mutexes etc. you have in place, the worse Chess will do in finding bugs -- and if your multithreaded coded contains no locks at all Chess won't find anything.Palmitin
yes it is an ambitious project with some way to go - I love the current latest version number: v0.1.30626.0. They clearly have quite a way to go before a v1.0!Ferroelectric
I found this page: projectbentley.com/work/chess/lockfreequeue.php#test1 by somebody who's been playing CHESS (sorry, couldn't resist!) to try and detect lock-free errors. He discovered that marking things volatile caused it to pick up lock-free errors with the test written correctly. Doesn't mean it would work for all lock-free stuff, though. But there we see it - in this case you'd probably have to change the declaration of some of your variables to 'volatile' with an #if...#endif for a 'TESTING' build.Ferroelectric
Good that they implemented the volatile "marker" for .NET programs. Although I don't use .NET, I think it would be nice if there was a way to convince the compiler that all your variables were volatile without writing it everywhere.Palmitin
I agree - volatility in .Net is frowned upon by the framework designers. Our interlocked operations, which require a 'ref' to an object, throw compiler warnings because a 'ref' to a volatile value is considered dodgy. I wish there was a lower level API to interlocked read/write operations and memory barriers in .Net (don't even get me started on SIMD!) - I'm not convinced that our Interlocked operations get turned into the associated CPU instructions, I think they utilise Windows' API for it.Ferroelectric
@Andras: Yes, the ref/volatile warning in the interlocked API is unfortunate. The last time I looked into the .NET interlocked operations' implementations, they all seemed to be using compare-exchange internally (the Win32 version does compile down to the right CPU instruction, by the way), which is correct but sub-optimal for things like fetch-and-add (which can be done in a single hardware instruction without looping, albeit with potentially high latency).Forkey
F
4

Data race detection is an NP hard problem [Netzer&Miller 1990]

I heard about the tools Lockset, and DJit+ (they teach it in the CDP course). Try reading the slides, and googling what they reference to. It contains some interesting information.

Flexor answered 15/1, 2010 at 13:54 Comment(4)
Looks interesting, but from skimming the Powerpoint it seems that neither tool can make any guarantees, since they only examine whatever instruction sequences the scheduler gives them, not all possible sequences as Spin and Chess do. Also, got any links for the tools?Palmitin
In Uni they only teach you the theory, haha :). I did find the original lockset paper, though: cs.washington.edu/homes/tom/pubs/eraser.pdf . I don't remember playing around with any of these tools in the course, though... hmm... They might be only uni-level implementations for paper purposes. In this case, I think that maybe I should remove my comment altogether.Flexor
I realized this would be an NP problem, thats why I made the comment about brute force. Its still possible to do thorough testing, although the more complex the problem (more operations/input combinations), the longer it will take to test all possible sequences.Waterresistant
Don't remove this useful answer! :)Palmitin
P
3

If you want to really verify lock-free code (as opposed to exhaustively testing a small instance), you can use VCC (http://vcc.codeplex.com), a deductive verifier for concurrent C code which has been used to verify some interesting lock-free algorithms (e.g. lock-free lists and resizable hashtables using hazard pointers, optimistic multiversion transaction processing, MMU virtualization, various synchronization primitives, etc.). It does modular verification, and has been used to verify nontrivial chunks of industrial code (up to about 20KLOC).

Note, however, that VCC is a verifier, not a bug hunting tool; you will have to do substantial annotation on your code to get it to verify, and the learning curve can be a bit steep. Note also that it assumes sequential consistency (as do most tools).

BTW, peer review is not a good way to verify a concurrent algorithm (or even a sequential one). There's a long history of famous people publishing concurrent algorithms in important journals, only to have bugs discovered years later.

Pseudonym answered 5/11, 2013 at 9:42 Comment(2)
Looks interesting, this is the kind of thing I was hoping to find (I've never been keen on peer review for this kind of stuff, it's too easy to overlook complex problems, or think that verifying one part means another can be assumed correct).Waterresistant
Actually, peer review can be effective for lock-free concurrent programming if you are disciplined about how you do lock-free programming. You have to take object invariants seriously, and for linearizable data structures, you have to be very explicit about where the operation "appears" to happen.Pseudonym

© 2022 - 2024 — McMap. All rights reserved.