Calculate the histogram with OpenMP
Asked Answered
C

4

5

I want to parallelize this code getting the best performance. "histogram" stores number of appareances of a certain colour (there are 10 different colours, so the size of histogram is 10). "img" is an array which stores a certain image information. In each index of img is stored a colour (int value, range 0..9). This is the code:

for( i=0; i<N1; i++ ){
  for( j=0; j<N2; j++ ){
    histogram[ img[i][j] ]  = histogram[ img[i][j] ] + 1;
  }
}

I tried this but the performance is so bad (worse than serial execution):

#pragma omp parallel for schedule(static, N1/nthreads) private(i,j)
for(i=0; i<N1; i++){
  for(j=0; j<N2; j++)
  {
    #pragma omp atomic
    histogram[img[i][j]]++;
  }
}

Any suggestions? Thank you.

Chenopod answered 14/2, 2014 at 11:12 Comment(0)
S
6

I already went into detail on how to to this here Fill histograms (array reduction) in parallel with OpenMP without using a critical section

It's the same as an array reduction. OpenMP does not have built in support for this in C/C++ (but it does in Fortran) so you have to do it yourself.

The easy solution is to create private version of the histogram, fill them in parallel, and them merge them into one histogram in a critical section. In your case you can do that like this:

int i, histogram[10];
for(i=0; i<10; i++) histogram[i] = 0;
#pragma omp parallel
{
    int i, j, histogram_private[10];
    for(i=0; i<10; i++) histogram_private[i] = 0;
    #pragma omp for nowait
    for(i=0; i<N1; i++) {
       for(j=0; j<N2; j++) {    
           histogram_private[img[i][j]]++;
       }
    }      
    #pragma omp critical 
    {
        for(i=0; i<10; i++) histogram[i] += histogram_private[i];
    }
}

It's possible to merge in parallel as well but that's more complicated. See the first link I mentioned for more details.

Selfindulgent answered 14/2, 2014 at 19:40 Comment(5)
I tried this but I get this message error: "Segment violation ('core' generated)". It happens in "#pragma omp for nowait" section.Chenopod
@Ortzi. It works fine for me. You can see the results and even edit/compile the code at coliru.stacked-crooked.com/a/c63d92bf5389fff8Selfindulgent
Try with 5000 length N1 and N2. That's the problem: coliru.stacked-crooked.com/a/56bdf5e3bf756ceaChenopod
@Ortzi, I can't do everything for you! You're trying to allocate a 5000x5000 int array on the stack now! That's 95MB! You have to use malloc for such large arrays.Selfindulgent
@Ortzi, here, I fixed this using malloc for large arrays coliru.stacked-crooked.com/a/7e862724c4a66278Selfindulgent
E
6

With OpenMP 4.5, you can simply use a array reduction in C:

int histogram[BINS] = {0};
#pragma omp parallel for reduction(+:hist)
for(i=0; i<N1; i++) {
   for(j=0; j<N2; j++) {    
       histogram[img[i][j]]++;
   }
}
Eogene answered 23/6, 2017 at 8:29 Comment(4)
This is probably the best answer nowadays, as it reflects exactly what you are trying to achieve, and lets the compiler/runtime library to do the hard work. Less error-prone and probably more efficient.Dickie
@JorgeBellón regarding the proposed edit. Do you know whether the section specification is in fact necessary? I tested with just the array and GCC 7 is fine with it. The standard says "If the list item is an array or array section, ...", but also "C / C++: The list items that appear in the reduction clause may include array sections." So I'm not 100% sure. Also hist[:] should be the same as hist[:BINS] if the compiler knows the length, right?Eogene
Hi @Zalan. You may be completely right. I tried to find that part of the spec, but I'm afraid I was unable to do so. I would really appreciate if you could refer to the chapter to me. I remember that using pointers was not allowed, but I read somewhere that they changed this restriction in 4.5 to allow whole arrays to be reduced.Dickie
@JorgeBellón I was referring to page 205:14 and section 2.4Eogene
A
0

You want to create a kind of "reduction" so each thread should have its own histogram array and you have to merge all histo in a 2nd loop .... See pseudo code below:

histogram = new int[256];
histogram_thread = new int[nbthread * 256];

#pragma omp parallel 
for(i=0; i<N1; i++){
  current_thread_id = omp_get_thread_num();
  for(j=0; j<N2; j++)
  {
    histogram_thread[current_thread_id*256 + img[i][j]]++;
  }
}

//merge
for(unsigned int ui = 0 ; ui < 256 ; ++ui)
{ 
   for(int t=0; t<nbthread ; ++t) 
   {
      histogram [i] += histogram_thread[t * 256 + i];
   }
}

delete [] histogram_thread;
Acth answered 14/2, 2014 at 13:55 Comment(0)
P
0

I faced the same problem, making an algorithm for applying auto-levels transformation on an image (having RGB values in [0;255] region) to adjust contrast.

That was my solution for histogram part:

int hist[256] = {0};

#pragma omp parallel
{
    int hist_thread[256] = {0};
    #pragma omp for schedule(static)
    for (int i = 0; i < img.data_size; i++) {
        hist_thread[img.data[i]]++;
    }
    for (int i = 0; i < 256; i++) {
        #pragma omp atomic
        hist[i] += hist_thread[i];
    }
}

So the idea is to create a private histogram hist_thread for each thread. And after counting merge them with omp atomic instruction.

Have to mention that it is better to use omp critical section when number of possible values is much more than 10 or 256 (around 10^5 or 10^6). It is because performing of N_THREADS critical-s with N_VALUES iterations will be most of times less than performing N_THREADS * N_VALUES atomic-s in parallel. But difference is not big, though. Code for solution with critical is contained in @Z boson's answer.

Peristalsis answered 3/10 at 11:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.