Does pthread_mutex_lock contains memory fence instruction? [duplicate]
Asked Answered
R

2

17

Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions? Or do the the lower level instructions like compare_and_swap implicity have memory barriers?

Roi answered 10/6, 2014 at 9:53 Comment(0)
N
20

Do pthread_mutex_lock and pthread_mutex_unlock functions call memory fence/barrier instructions?

They do, as well as thread creation.

Note, however, there are two types of memory barriers: compiler and hardware.

Compiler barriers only prevent the compiler from reordering reads and writes and speculating variable values, but don't prevent the CPU from reordering.

The hardware barriers prevent the CPU from reordering reads and writes. Full memory fence is usually the slowest instruction, most of the time you only need operations with acquire and release semantics (to implement spinlocks and mutexes).

With multi-threading you need both barriers most of the time.

Any function whose definition is not available in this translation unit (and is not intrinsic) is a compiler memory barrier. pthread_mutex_lock, pthread_mutex_unlock, pthread_create also issue a hardware memory barrier to prevent the CPU from reordering reads and writes.

From Programming with POSIX Threads by David R. Butenhof:

Pthreads provides a few basic rules about memory visibility. You can count on all implementations of the standard to follow these rules:

  1. Whatever memory values a thread can see when it calls pthread_create can also be seen by the new thread when it starts. Any data written to memory after the call to pthread_create may not necessarily be seen by the new thread, even if the write occurs before the thread starts.

  2. Whatever memory values a thread can see when it unlocks a mutex, either directly or by waiting on a condition variable, can also be seen by any thread that later locks the same mutex. Again, data written after the mutex is unlocked may not necessarily be seen by the thread that locks the mutex, even if the write occurs before the lock.

  3. Whatever memory values a thread can see when it terminates, either by cancellation, returning from its start function, or by calling pthread_exit, can also be seen by the thread that joins with the terminated thread by calling pthread_join. And, of course, data written after the thread terminates may not necessarily be seen by the thread that joins, even if the write occurs before the join.

  4. Whatever memory values a thread can see when it signals or broadcasts a condition variable can also be seen by any thread that is awakened by that signal or broadcast. And, one more time, data written after the signal or broadcast may not necessarily be seen by the thread that wakes up, even if the write occurs before it awakens.

Also see C++ and Beyond 2012: Herb Sutter - atomic<> Weapons for more details.

Negligee answered 10/6, 2014 at 10:16 Comment(0)
C
16

Please take a look at section 4.12 of the POSIX specification.

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [emphasis mine]

Then a list of functions is given which synchronize memory, plus a few additional notes.

If that requires memory barrier instructions on some architecture, then those must be used.

About compare_and_swap: that isn't in POSIX; check the documentation for whatever you are using. For instance, IBM defines a compare_and_swap function for AIX 5.3. which doesn't have full memory barrier semantics The documentation note says:

If compare_and_swap is used as a locking primitive, insert an isync at the start of any critical sections.

From this documentation we can guess that IBM's compare_and_swap has release semantics: since the documentation does not require a barrier for the end of the critical section. The acquiring processor needs to issue an isync to make sure it is not reading stale data, but the publishing processor doesn't have to do anything.

At the instruction level, some processors have compare and swap with certain synchronizing guarantees, and some don't.

Cambridge answered 10/6, 2014 at 14:18 Comment(1)
Nice reference to POSIX, i didn't know POSIX specified the exact list of functions for that. +1Negligee

© 2022 - 2024 — McMap. All rights reserved.