Please explain from Linux, Windows perspectives?
I am programming in C#, would these two terms make a difference. Please post as much as you can, with examples and such....
Thanks
Please explain from Linux, Windows perspectives?
I am programming in C#, would these two terms make a difference. Please post as much as you can, with examples and such....
Thanks
For Windows, critical sections are lighter-weight than mutexes.
Mutexes can be shared between processes, but always result in a system call to the kernel which has some overhead.
Critical sections can only be used within one process, but have the advantage that they only switch to kernel mode in the case of contention - Uncontended acquires, which should be the common case, are incredibly fast. In the case of contention, they enter the kernel to wait on some synchronization primitive (like an event or semaphore).
I wrote a quick sample app that compares the time between the two of them. On my system for 1,000,000 uncontended acquires and releases, a mutex takes over one second. A critical section takes ~50 ms for 1,000,000 acquires.
Here's the test code, I ran this and got similar results if mutex is first or second, so we aren't seeing any other effects.
HANDLE mutex = CreateMutex(NULL, FALSE, NULL);
CRITICAL_SECTION critSec;
InitializeCriticalSection(&critSec);
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
LARGE_INTEGER start, end;
// Force code into memory, so we don't see any effects of paging.
EnterCriticalSection(&critSec);
LeaveCriticalSection(&critSec);
QueryPerformanceCounter(&start);
for (int i = 0; i < 1000000; i++)
{
EnterCriticalSection(&critSec);
LeaveCriticalSection(&critSec);
}
QueryPerformanceCounter(&end);
int totalTimeCS = (int)((end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);
// Force code into memory, so we don't see any effects of paging.
WaitForSingleObject(mutex, INFINITE);
ReleaseMutex(mutex);
QueryPerformanceCounter(&start);
for (int i = 0; i < 1000000; i++)
{
WaitForSingleObject(mutex, INFINITE);
ReleaseMutex(mutex);
}
QueryPerformanceCounter(&end);
int totalTime = (int)((end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);
printf("Mutex: %d CritSec: %d\n", totalTime, totalTimeCS);
From a theoretical perspective, a critical section is a piece of code that must not be run by multiple threads at once because the code accesses shared resources.
A mutex is an algorithm (and sometimes the name of a data structure) that is used to protect critical sections.
Semaphores and Monitors are common implementations of a mutex.
In practice there are many mutex implementation availiable in windows. They mainly differ as consequence of their implementation by their level of locking, their scopes, their costs, and their performance under different levels of contention. See CLR Inside Out - Using concurrency for scalability for an chart of the costs of different mutex implementations.
Availiable synchronization primitives.
The lock(object)
statement is implemented using a Monitor
- see MSDN for reference.
In the last years much research is done on non-blocking synchronization. The goal is to implement algorithms in a lock-free or wait-free way. In such algorithms a process helps other processes to finish their work so that the process can finally finish its work. In consequence a process can finish its work even when other processes, that tried to perform some work, hang. Usinig locks, they would not release their locks and prevent other processes from continuing.
In addition to the other answers, the following details are specific to critical sections on windows:
InterlockedCompareExchange
operationIn linux, I think that they have a "spin lock" that serves a similar purpose to the critical section with a spin count.
Critical Section and Mutex are not Operating system specific, their concepts of multithreading/multiprocessing.
Critical Section Is a piece of code that must only run by it self at any given time (for example, there are 5 threads running simultaneously and a function called "critical_section_function" which updates a array... you don't want all 5 threads updating the array at once. So when the program is running critical_section_function(), none of the other threads must run their critical_section_function.
mutex* Mutex is a way of implementing the critical section code (think of it like a token... the thread must have possession of it to run the critical_section_code)
A mutex is an object that a thread can acquire, preventing other threads from acquiring it. It is advisory, not mandatory; a thread can use the resource the mutex represents without acquiring it.
A critical section is a length of code that is guaranteed by the operating system to not be interupted. In pseudo-code, it would be like:
StartCriticalSection();
DoSomethingImportant();
DoSomeOtherImportantThing();
EndCriticalSection();
The 'fast' Windows equal of critical selection in Linux would be a futex, which stands for fast user space mutex. The difference between a futex and a mutex is that with a futex, the kernel only becomes involved when arbitration is required, so you save the overhead of talking to the kernel each time the atomic counter is modified. That .. can save a significant amount of time negotiating locks in some applications.
A futex can also be shared amongst processes, using the means you would employ to share a mutex.
Unfortunately, futexes can be very tricky to implement (PDF). (2018 update, they aren't nearly as scary as they were in 2009).
Beyond that, its pretty much the same across both platforms. You're making atomic, token driven updates to a shared structure in a manner that (hopefully) does not cause starvation. What remains is simply the method of accomplishing that.
Just to add my 2 cents, critical Sections are defined as a structure and operations on them are performed in user-mode context.
ntdll!_RTL_CRITICAL_SECTION +0x000 DebugInfo : Ptr32 _RTL_CRITICAL_SECTION_DEBUG +0x004 LockCount : Int4B +0x008 RecursionCount : Int4B +0x00c OwningThread : Ptr32 Void +0x010 LockSemaphore : Ptr32 Void +0x014 SpinCount : Uint4B
Whereas mutex are kernel objects (ExMutantObjectType) created in the Windows object directory. Mutex operations are mostly implemented in kernel-mode. For instance, when creating a Mutex, you end up calling nt!NtCreateMutant in kernel.
In Windows, a critical section is local to your process. A mutex can be shared/accessed across processes. Basically, critical sections are much cheaper. Can't comment on Linux specifically, but on some systems they're just aliases for the same thing.
Great answer from Michael. I've added a third test for the mutex class introduced in C++11. The result is somewhat interesting, and still supports his original endorsement of CRITICAL_SECTION objects for single processes.
mutex m;
HANDLE mutex = CreateMutex(NULL, FALSE, NULL);
CRITICAL_SECTION critSec;
InitializeCriticalSection(&critSec);
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
LARGE_INTEGER start, end;
// Force code into memory, so we don't see any effects of paging.
EnterCriticalSection(&critSec);
LeaveCriticalSection(&critSec);
QueryPerformanceCounter(&start);
for (int i = 0; i < 1000000; i++)
{
EnterCriticalSection(&critSec);
LeaveCriticalSection(&critSec);
}
QueryPerformanceCounter(&end);
int totalTimeCS = (int)((end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);
// Force code into memory, so we don't see any effects of paging.
WaitForSingleObject(mutex, INFINITE);
ReleaseMutex(mutex);
QueryPerformanceCounter(&start);
for (int i = 0; i < 1000000; i++)
{
WaitForSingleObject(mutex, INFINITE);
ReleaseMutex(mutex);
}
QueryPerformanceCounter(&end);
int totalTime = (int)((end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);
// Force code into memory, so we don't see any effects of paging.
m.lock();
m.unlock();
QueryPerformanceCounter(&start);
for (int i = 0; i < 1000000; i++)
{
m.lock();
m.unlock();
}
QueryPerformanceCounter(&end);
int totalTimeM = (int)((end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);
printf("C++ Mutex: %d Mutex: %d CritSec: %d\n", totalTimeM, totalTime, totalTimeCS);
My results were 217, 473, and 19 (note that my ratio of times for the last two is roughly comparable to Michael's, but my machine is at least four years younger than his, so you can see evidence of increased speed between 2009 and 2013, when the XPS-8700 came out). The new mutex class is twice as fast as the Windows mutex, but still less than a tenth the speed of the Windows CRITICAL_SECTION object. Note that I only tested the non-recursive mutex. CRITICAL_SECTION objects are recursive (one thread can enter them repeatedly, provided it leaves the same number of times).
I found the explanations stating that critical sections protect a code section from being entered by multiple threads quite misleading. There is no point in protecting code, as code is read only and can't be modified by multiple threads. What one usually wants is to protect data from being modified by multiple threads, leading to incoherent state. Commonly a mutex (or critical section, fulfilling the same purpose) should be associated with some piece of data. Each code section accessing this data should aquire the mutex/critical section and release when it's finished accessing the data. This may be considerably more fine grained than just locking out threads from entering a function. Also, from my experience, locking functions by some synchronisation is much more prone for errors, in particular dead locks. A good article covering that topic can be found here: https://www.bogotobogo.com/cplusplus/multithreaded4_cplusplus11B.php
So, in summary (recursive) mutexes and critical sections basically fulfill the same purpose, which is rather not protecting code, but protecting data instead.
Critical sections may be implemented more efficiently than plain kernel mutexes. The example given in the first answer is a bit misleading, because it does not depict what the synchronisation primitive is designed for: synchronize access to sth. from multiple threads. The example just measures the trivial case, when the critical section/mutex is never owned by another thread. While critical sections could be more efficient if, for instance, two threads access data in short, interlocked periods, they could prove less efficient if we got lots of threads accessing the same piece of data. Each thread would spinlock until giving up and waiting for the semaphore, part of the implementation of the critical section. Such a case should also be considered when measuring execution times.
A C functions is called reentrant if it uses its actual parameters only.
Reentrant functions can be called by multiple threads at the same time.
Example of reentrant function:
int reentrant_function (int a, int b)
{
int c;
c = a + b;
return c;
}
Example of non reentrant function:
int result;
void non_reentrant_function (int a, int b)
{
int c;
c = a + b;
result = c;
}
The C standard library strtok()
is not reentrant and can't be used by 2 or more threads at the same time.
Some platform SDK's comes with the reentrant version of strtok()
called strtok_r()
;
© 2022 - 2024 — McMap. All rights reserved.