I want to parallelize this code getting the best performance. "histogram" stores number of appareances of a certain colour (there are 10 different colours, so the size of histogram is 10). "img" is an array which stores a certain image information. In each index of img is stored a colour (int value, range 0..9). This is the code:
for( i=0; i<N1; i++ ){
for( j=0; j<N2; j++ ){
histogram[ img[i][j] ] = histogram[ img[i][j] ] + 1;
}
}
I tried this but the performance is so bad (worse than serial execution):
#pragma omp parallel for schedule(static, N1/nthreads) private(i,j)
for(i=0; i<N1; i++){
for(j=0; j<N2; j++)
{
#pragma omp atomic
histogram[img[i][j]]++;
}
}
Any suggestions? Thank you.