What is the difference between atomic and critical in OpenMP?
Asked Answered
N

8

129

What is the difference between atomic and critical in OpenMP?

I can do this

#pragma omp atomic
g_qCount++;

but isn't this same as

#pragma omp critical
g_qCount++;

?

Noles answered 17/10, 2011 at 18:35 Comment(0)
C
204

The effect on g_qCount is the same, but what's done is different.

An OpenMP critical section is completely general - it can surround any arbitrary block of code. You pay for that generality, however, by incurring significant overhead every time a thread enters and exits the critical section (on top of the inherent cost of serialization).

(In addition, in OpenMP all unnamed critical sections are considered identical (if you prefer, there's only one lock for all unnamed critical sections), so that if one thread is in one [unnamed] critical section as above, no thread can enter any [unnamed] critical section. As you might guess, you can get around this by using named critical sections).

An atomic operation has much lower overhead. Where available, it takes advantage on the hardware providing (say) an atomic increment operation; in that case there's no lock/unlock needed on entering/exiting the line of code, it just does the atomic increment which the hardware tells you can't be interfered with.

The upsides are that the overhead is much lower, and one thread being in an atomic operation doesn't block any (different) atomic operations about to happen. The downside is the restricted set of operations that atomic supports.

Of course, in either case, you incur the cost of serialization.

Calculating answered 17/10, 2011 at 20:11 Comment(3)
"you could loose portability" - I'm not sure this is true. The standard (version 2.0) specifies which atomic operations are allowed (basically things like ++ and *=) and that if they aren't supported in hardware, they might be replaced by critical sections.Clovis
@DanRoche: Yes, you're quite right. I don't think that statement was ever correct, I'll correct it now.Calculating
A few days ago I followed a OpenMP tutorial, and as far as I understood, there is a difference in the two different code. That is the result can differ becaus the critical section assures that the instruction is executed by a thread a time, however it is possible that the instruction: g_qCount = g_qCount+1; for thread 1 simply stores the g_qCount result only in the writebuffer not in RAM memory, and when thread 2 fetch the value g_qCount, it simply read the one in the RAM, not in the writebuffer. Atomic instruction assures the instruction flushed the data to memoryBootie
I
35

In OpenMP, all the unnamed critical sections are mutually exclusive.

The most important difference between critical and atomic is that atomic can protect only a single assignment and you can use it with specific operators.

Ilene answered 27/2, 2012 at 14:40 Comment(1)
This would have better been a comment (or an edit) of the previous answer.Dagenham
M
32

Critical section:

  • Ensures serialisation of blocks of code.
  • Can be extended to serialise groups of blocks with proper use of "name" tag.

  • Slower!

Atomic operation:

  • Is much faster!

  • Only ensures the serialisation of a particular operation.

Maintenon answered 23/12, 2013 at 2:41 Comment(1)
But this answer is very readable and would be a great sum up of the first answerNonrepresentational
W
7

The fastest way is neither critical nor atomic. Approximately, addition with critical section is 200 times more expensive than simple addition, atomic addition is 25 times more expensive then simple addition.

The fastest option (not always applicable) is to give each thread its own counter and make reduce operation when you need total sum.

Wreak answered 26/1, 2014 at 14:39 Comment(2)
I disagree with all numbers you mention in your explanation. Assuming x86_64, the atomic operation will have a few cycle overhead (synchronizing a cache line) on the cost of roughly a cycle. If you would have a ''true sharing'' cost otherwise, the overhead is nihil. A critical section incurs the cost of a lock. Depending on whether the lock is already taken or not, the overhead is roughly 2 atomic instructions OR two runs of the scheduler and the sleep time - that will usually be significantly more than 200x.Embolus
The option you are suggesting might lead to a huge request on memory that we might don't have at our disposal. For instance if I'm working on data of 1000x1000x1000 cells and that I'm working with 10 or 100 threads, the internal copies created for each thread will for sure saturate the RAM.Syd
E
6

The limitations of atomic are important. They should be detailed on the OpenMP specs. MSDN offers a quick cheat sheet as I wouldn't be surprised if this will not change. (Visual Studio 2012 has an OpenMP implementation from March 2002.) To quote MSDN:

The expression statement must have one of the following forms:

xbinop=expr

x++

++x

x--

--x

In the preceding expressions: x is an lvalue expression with scalar type. expr is an expression with scalar type, and it does not reference the object designated by x. binop is not an overloaded operator and is one of +, *, -, /, &, ^, |, <<, or >>.

I recommend to use atomic when you can and named critical sections otherwise. Naming them is important; you'll avoid debugging headaches this way.

Elate answered 1/11, 2013 at 19:31 Comment(1)
This are not the all , we have other advanced atomic directives like : #pragma omp aromic update(or read , upate,write , capture ) so it allows us to have some other beneficial statementWager
L
1

Already great explanations here. However, we can dive a bit deeper. To understand the core difference between the atomic and critical section concepts in OpenMP, we have to understand the concept of lock first. Let's review why we need to use locks.

A parallel program is being executed by multiple threads. Deterministic results will happen if and only if we perform synchronization between these threads. Of course, synchronization between threads is not always required. We are referring to those cases that synchronization is necessary.

In order to synchronize the threads in a multi-threaded program, we'll use lock. When the access is required to be restricted by only one thread at a time, locks come into play. The lock concept implementation may vary from processor to processor. Let's find out how a simple lock may work from an algorithmic point of view.

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock.
   2.2. If lock == 0, lock = 1 and goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

The given algorithm can be implemented in the hardware language as follows. We'll be assuming a single processor and analyze the behavior of locks in that. For this practice, let's assume one of the following processors: MIPS, Alpha, ARM or Power.

try:    LW R1, lock
        BNEZ R1, try
        ADDI R1, R1, #1
        SW R1, lock

This program seems to be OK, but It is not. The above code suffers from the previous problem; synchronization. Let's find the problem. Assume the initial value of lock to be zero. If two threads run this code, one might reach the SW R1, lock before the other one reads the lock variable. Thus, both of them think that the lock is free. To solve this issue, there is another instruction provided rather than simple LW and SW. It is called Read-Modify-Write instruction. It is a complex instruction (consisting of subinstructions) which assures the lock acquisition procedure is done by only a single thread at a time. The difference of Read-Modify-Write compared to the simple Read and Write instructions is that it uses a different way of Loading and Storing. It uses LL(Load Linked) to load the lock variable and SC(Store Conditional) to write to the lock variable. An additional Link Register is used to assure the procedure of lock acquisition is done by a single thread. The algorithm is given below.

1. Define a variable called lock.
2. For each thread:
   2.1. Read the lock and put the address of lock variable inside the Link Register.
   2.2. If (lock == 0) and (&lock == Link Register), lock = 1 and reset the Link Register then goto 3    // Try to grab the lock
       Else goto 2.1    // Wait until the lock is released
3. Do something...
4. lock = 0    // Release the lock

When the link register is reset, if another thread has assumed the lock to be free, it won't be able to write the incremented value to the lock again. Thus, the concurrency of access to the lock variable is acquired.

The core difference between critical and atomic comes from the idea that:

Why to use locks (a new variable) while we can use the actual variable (which we are performing an operation on it), as a lock variable?

Using a new variable for locks will lead to critical section, while using the actual variable as a lock will lead to atomic concept. The critical section is useful when we are performing a lot of computations (more than one line) on the actual variable. That's because, if the result of those computations fails to be written on the actual variable, the whole procedure should be repeated to compute the results. This can lead to a poor performance compared to waiting for the lock to be released before entering a highly-computational region. Thus, it is recommended to use the atomic directive whenever you want to perform a single computation (x++, x--, ++x, --x, etc.) and use critical directive when a more computationally complex region is being done by the intensive section.

Lutanist answered 2/5, 2018 at 8:13 Comment(0)
C
1

Critical clause applies mutable exclusion to the code block and guarantees that only one thread will execute the code block at a given time and the thread completes the code block and outs the other threads are Wellcome to acquire the lock for the block to execute.

Atomic clause is only applicable to one single statement that has any math symbol in it but the difference is not only limited by the size of the expressions. The atomic clause protects the address location that's assigned the element to the left and only guarantees the assignment to that variable. so you may assume that if any function call exists on the right of the statement it could be executed parallel.

#pragma omp atomic
a = 5 + fnk();

here fnk(); could be called by multiple threads at the same time but the assignment to the a must be mutually exclusive. As you can see below, fnk() call is intervined by another thread and we got the result 0 2 2 and 0 respectively. That would't be the case if we'd used critical clause.
enter image description here enter image description here

Crandale answered 12/4, 2022 at 19:14 Comment(1)
Please ignore that the count is shared variable and the result may be different from iteration to iterations focus on the concurrent function call that occured on the atomic section. but the assignment is executed atomicly.Crandale
W
-6

atomic is a single statement Critical section, i.e. you lock for one statement execution

critical section is a lock on a block of code

A good compiler will translate your second code the same way it does the first

Weaponry answered 18/12, 2013 at 17:17 Comment(1)
That's just wrong. Please don't talk about stuff you don't understand.Jolson

© 2022 - 2024 — McMap. All rights reserved.