Can you avoid locking by guaranteeing that multiple threads won't access the same memory?

Asked 12/12, 2013 at 16:14 Answered 18/12, 2013 at 1:38

Say I have a large array and I want to process the contents with multiple threads. If I delegate each thread to a specific section, guaranteeing no overlap, does that eliminate any need for locking, assuming the threads don't access any other memory outside the array?

Something like this (pseudo-code):

global array[9000000];

do_something(chunk) {
    for (i = chunk.start; i < chunk.end; i++)
        //do something with array
}

main() {
    chunk1 = {start: 0, end: 5000000};
    chunk2 = {start: 5000000, end: 9000000};

    start_thread(thread1, do_something(chunk1));
    start_thread(thread2, do_something(chunk2));

    wait_for_join(thread1);
    wait_for_join(thread2);
    //do something else with the altered array
}

Strafford answered 12/12, 2013 at 16:14 Comment(0)

In a conforming C++11 compiler this is safe [intro.memory] (§1.7):

A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having non-zero width. [...] Two threads of execution (1.10) can update and access separate memory locations without interfering with each other.

C11 gives identical guarantees (they even use the same wording) in §3.14.

In a C++03 compiler this is not guaranteed to work by the standard, but it might still work if the compiler provides similar guarantees as an extension.

Playmate answered 12/12, 2013 at 16:21 Comment(2)

Is there a compiler that doesn't provide this behavior in C++03 ? I though that, although it wasn't formalize, it was a pretty common feature on modern hardware... – Chem 12/12, 2013 at 16:44

It has nothing to do with the hardware. Let's say I want to copy 15 bytes from src to dst. Compiler reads the byte in dst that should stay unchanged, copies 16 bytes using a vector register, writes the one byte back. All while another thread looked at that byte. Your source code didn't access the same code, but the compiled code did. C11 and C++11 won't do this. Remember that different bit fields can count as the same memory. – Dougie 23/3, 2014 at 16:58

Yes: if you can guarantee that no two threads will access the same element, then there's no need for any further synchronisation.

There is only a conflict (and therefore a potential data race) if two threads access the same memory location (with at least one of them modifying it) without synchronisation.

(NOTE: this answer is based on the C++11 memory model. I've just noticed that you're also asking about a second language; I believe that C11 specifies a very similar memory model, but can't say for sure that the answer is also valid for C. For older versions of both languages, thread-safety was implementation-dependent.)

Kaiserslautern answered 12/12, 2013 at 16:17 Comment(3)

"...if two threads access the same memory location without synchronisation" and one of those accesses is a write. – Leisurely 12/12, 2013 at 19:58

@AndrewDurward: Indeed, I simplified slightly. The relevant point for this question is that if they don't access the same location, there can't be a conflict. – Kaiserslautern 12/12, 2013 at 20:9

If the 'chunks' overlap on cache lines, then you may encounter reduced performance if two threads are processing items on the same cache line from different CPU cores. For best performance, make the chunk size a multiple of the platform's cache line size and properly aligned. But regardless, no synchronization is needed. – Fouquiertinville 21/12, 2013 at 21:28

Yes, you can indeed.

TCMalloc is a good example.

Socialminded answered 12/12, 2013 at 16:17 Comment(0)

Yes.

You do not even need to guarantee that no two threads access the same memory location. All you need to guarantee is that no single thread modifies any location that another one accesses (regardless whether that means reading or writing).

Given either no concurrent access at all or read-only concurrent access, you're good to go without locking.

Monocarpic answered 12/12, 2013 at 19:3 Comment(2)

A single modifier isn't enough to gurantee safety. A reader, reading data as it's written, may get invalid results. – Corene 12/12, 2013 at 20:38

@ugoren: RIght. That is why I said "no single thread writing". A reader may get incoherent results (though not "garbled", just incoherent, due to how all modern CPUs work, but this is usually bad enough) with a writer. In other words, any number of readers is safe, but no writers (on the same location as a reader). – Monocarpic 12/12, 2013 at 21:43

Important performance issue !

Right, you doesn't need explicit locking, since, as said by others, no memory location is shared.

But you may trigger implicit hardware synchronization, since arbitrary chunks are likely to lead cache lines to be shared (not much with the figures used in your example, though). It is known as false sharing.

Higher level approach such as OpenMP resolves these kinds of issue :

chunks alignment (and threads numbers) are tuned according to underlying hardware.
it provides a better control over pool of threads, eg amortizing the cost of thread instantiations.
it's easier to write, and actually less intrusive.

Durga answered 18/12, 2013 at 1:38 Comment(0)

Recommended topics

Hot tags