OpenMP - critical section + reduction
Asked Answered
H

2

5

I'm currently learning Parallel Programming using C and OpenMP. I wanted to write simple code where two shared values are beeing incremented by multiple threads. Firstly I used reduction directive and it worked as it was meant to. Then I switched to using the critical directive to initiate critical section - it also worked. Out of curiosity I've tried to merge these two solution and check the behaviour. I expected two valid, equal values.

code:

#include <stdio.h>
#include <stdlib.h>
#include "omp.h"

#define ITER 50000

int main( void )
{
    int x, y;
    #pragma omp parallel reduction(+:x,y)
    {
       #pragma omp for
       for (int i = 0; i < ITER; i++ )  
       {
            x++;
            #pragma omp critical
            y++;
       }
    }

    printf("non critical = %d\ncritical = %d\n", x, y);
    return 0;
}

output:

non critical = 50000
critical = 4246432

Of course output is random when it comes to 'critical' (variable y), the other behaves as expected and is always 50000.

The behaviour of x is understandable - reduction makes it private in scope of single thread. After the incrementation values from threads are summed up and passed to the non-local x.

What I don't understand is the behaviour of y. It's private just like x but it's also inside the critical section so it 'has more than one reason' to be inaccessible from other threads. Yet what, I think, happens is the race condition. Did the critical somehow made y public (shared)?

I know that this code makes no sense since it's enough to use only one of reduction / critical. I'd like just to know what's behind such behaviour.

Hon answered 3/2, 2016 at 11:35 Comment(6)
And who initializes the variables? Seems like UB.Oconnor
OpenMP does that in reduction directive automatically (0 for + and -, 1 for * and /)Hon
For private copies, but not for the visible variable. What happens if you initialize it?Oconnor
I was convinced that the reduction implicitly privatises reduced values.Hon
@nullPointer You are technically correct. But why rely on OpenMP implicit behavior when you can be verbose and fully convince yourself that what you are writing is working exactly as you expect it to?Sashenka
I can't agree more - it's better to write a little more code just to avoid assumptions and get things straightforward as they should be. Nevertheless, at some point while playing with the new toy, which OpenMP is for me, I obtained such code with behaviour I wouldn't expect. Examples like these are IMO the most valuable since they give you a little experience. Even though it's basic incrementation.Hon
H
8

Your code simply exhibits undefined behaviour and the presence of critical has nothing to do with you getting wrong results.

Did the critical somehow made y public (shared)?

No, it did not. It only slows down the loop by preventing the concurrent execution of the threads.

What you are missing is that the result of the reduction operation is combined with the initial value of the reduction variable, i.e. with the value the variable had before the parallel region. In your case, both x and y have random initial values and therefore you are getting random results. That the initial value x happens to be 0 in your case and that's why you are getting the correct result for it is simply UB. Initialising both x and y makes your code behave as expected.

The OpenMP specification states:

The reduction clause specifies a reduction-identifier and one or more list items. For each list item, a private copy is created in each implicit task or SIMD lane, and is initialized with the initializer value of the reduction-identifier. After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier.

Here is the execution of your original code with 4 threads:

$ icc -O3 -openmp -std=c99 -o cnc cnc.c
$ OMP_NUM_THREADS=1 ./cnc
non critical = 82765
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82765
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82765
critical = 50194
$ OMP_NUM_THREADS=4 ./cnc
non critical = 82767
critical = 2112072800

The first run with one thread demonstrates that it is not due to a data race.

With int x=0, y=0;:

$ icc -O3 -openmp -std=c99 -o cnc cnc.c
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
$ OMP_NUM_THREADS=4 ./cnc
non critical = 50000
critical = 50000
Hoagland answered 4/2, 2016 at 14:26 Comment(0)
T
7

The primary problem with your code is that x and y are not initialized. A second problem is that the variable used in the critical section should be shared instead of a reduction variable, although this should only affect performance, not correctness.

I've corrected your code and modified it to demonstrate how reduce, critical and atomic all produce the same result.

Source

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char* argv[])
{
    int iter = (argc>1) ? atoi(argv[1]) : 50000;
    int r=0, c=0, a=0;

    printf("OpenMP threads = %d\n", omp_get_max_threads() );

    #pragma omp parallel reduction(+:r) shared(c,a)
    {
        #pragma omp for
        for (int i = 0; i < iter; i++ ) {
            r++;
            #pragma omp critical
            c++;
            #pragma omp atomic
            a++;
        }
    }
    printf("reduce   = %d\n"
           "critical = %d\n"
           "atomic   = %d\n", r, c, a);
    return 0;
}

Compile

icc -O3 -Wall -qopenmp -std=c99 redcrit.c

Output

OpenMP threads = 4
reduce   = 50000
critical = 50000
atomic   = 50000
Truett answered 3/2, 2016 at 13:57 Comment(5)
Thank you for a great answer! Double count explains a lot. However, I'm not entirely sure about the 'undefined behaviour' you've mentioned. I have this course at the university and our tutor has never initialised the reduced value - I was told that they are initiated by the OpenMP with 0 or 1, depending on the reduction's operator. Tbh I've done the same, and beside the situation above, it's always worked. EDIT: Although, I must say that when I initialised the values x,y without changing anything else the code worked just as @Oconnor suggestedHon
Yes, I see that OpenMP defines the default initializers for reduction variables. However, ISO C does not have a default initializer for local variables (https://mcmap.net/q/15564/-what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value), which means your program has undefined behavior if OpenMP is not enabled. You don't want that. Also, the critical variable does not have a default initializer.Truett
I believe your last sentence is also very crucial in terms of my problem. Once again thanks for helping me understand it :)Hon
@nullPointer Indeed, I didn't call that out explicitly, but yes, that would explain your results. Thanks for noting it.Truett
Your first sentence makes no sense to me. The presence of critical in no way changes the reduction properties besides slowing the code down. The code is simply UB because x and y are not initialised. Setting them to 0 before the parallel regions results in the correct output. The private copies of the reduction variables are initialised to zero, but then the result from the reduction is added to the values the variables had before the region, hence UB.Hoagland

© 2022 - 2024 — McMap. All rights reserved.