OpenCL CLK_LOCAL_MEM_FENCE causing abort trap 6
Asked Answered
B

2

0

I'm doing some exercise about convolution over images (info here) using OpenCL. When I use images whose size is not a square (like r x c) CLK_LOCAL_MEM_FENCE makes the program stop with abort trap 6.

What I do is essentially filing up the local memory with proper values, waiting for this process of filling the local memory to finish, using barrier(CLK_LOCAL_MEM_FENCE) and then calculating the values.

It seems like when I use images like those I've told you about barrier(CLK_LOCAL_MEM_FENCE) gives issues, if I comment that command everything work fine (which is weird since there's no synchronization). What may cause this problem any idea?

EDIT: the problem comes when the hight or the width or both are not multiple of the the local items size (16 x 16). The global items size is aways a couple of values multiple of 16 like (512 x 512).

int c = get_global_id(0); 
int r = get_global_id(1); 

int lc = get_local_id(0);
int lr = get_local_id(1);

// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;

// fill a local array...

barrier(CLK_LOCAL_MEM_FENCE);

if (c < outputImageWidth && r < outputImageHeight)
{
     // LOCAL DATA PROCESSED  
     OutputImage[r* outputImageWidth +c] = someValue;
}
Backlog answered 5/2, 2016 at 9:35 Comment(4)
How are we supposed to guess without code? Maybe you're not calling the barrier from all threads in the work group?Computerize
Code please. You are using it in a loop, and when the loop size is not square it is hanging.Outtalk
@DarkZeroswould you look at the EDIT? I've added some pseudo-code from my Kernel. I'm not sure if it depends on the implementation of the code I did not write here, but the fact is that when I comment the barrier, it seems like it works fine (it does not stop).Backlog
See, you had the exact problem I was expecting. The barrier is only encountered by items that pass the first condition, not by all the items, therefore it blocks the execution and aborts.Outtalk
F
3

OpenCL requires that each work-group barrier is executed by every work-item in that work-group.

In the code that you have posted, you have an early exit clause to prevent out-of-range accesses. This is a common trick for getting nice work-group sizes in OpenCL 1.X, but unfortunately this breaks the above condition, and this will lead to undefined behaviour (typically either a hang or a crash).

You will need to modify your kernel to avoid this, by either removing the early exit clause (and perhaps clamping out-of-range work-items instead, if applicable), or by restructuring the kernel so that out-of-range work-items continue at least as far as the barrier before exiting.

Ferdelance answered 5/2, 2016 at 14:10 Comment(1)
That was exactly what I was supposing. Thanks a lot for the confirmation!Backlog
O
2

You can change the code order without affecting the behaviour to fix it:

int c = get_global_id(0); 
int r = get_global_id(1); 

int lc = get_local_id(0);
int lr = get_local_id(1);

// fill a local array... with all the threads
// ie: for(i=0;i<size;i+=get_local_size(0))
//        ...

barrier(CLK_LOCAL_MEM_FENCE);

// this ignores indexes out of the input image.
if (c >= ImageWidth || r >= ImageHeight) return;

if (c < outputImageWidth && r < outputImageHeight)
{
     // LOCAL DATA PROCESSED  
     OutputImage[r* outputImageWidth +c] = someValue;
}
Outtalk answered 5/2, 2016 at 16:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.