For a given Kernel, why are work_groups of always the same size? I read somewhere (for the case in which we don't specify the local work size) that openCL creates 3 work groups(of 217 work-items each) for kernel with 651 work-items(divisible by 3) while it creates 653 work-groups of 1 work-item each, as 653 is a prime number.
Suppose we specify the local_work_size(i.e. no. of work-items in a work-group), let's say,5. And we have given the total work-items(global_work_size) as 9. How will the work groups be created? is this why the global_work_size have to be a multiple of local_work_size? If the data requires only 9 work-items, how do I increase it to 10(multiple of local_work_size,5)?
Why can't host allocate the memory for result array if it doesn't know how many work groups will execute the kernel?
Please help. I read all this on this: http://www.openclblog.com/2011/09/work-group-sizes.html