I've recently been playing with compute shaders and I'm trying to determine the most optimal way to setup my [numthreads(x,y,z)] and dispatch calls. My demo window is 800x600 and I am launching 1 thread per pixel. I am performing 2D texture modifications - nothing too heavy.
My first try was to specify
[numthreads(32,32,1)]
My Dispatch() calls are always
Dispatch(ceil(screenWidth/numThreads.x),ceil(screenHeight/numThreads.y),1)
So for the first instance that would be
Dispatch(25,19,1)
This ran at 25-26 fps. I then reduced to [numthreads(4,4,1)] which ran at 16 fps. Increasing that to [numthreads(16,16,1)] started yeilding nice results of about 30 fps. Toying with the Y thread group number [numthreads(16,8,1)] managed to push it to 32 fps.
My question is is there an optimal way to determine the thread number so I can utilize the GPU most effectively or is the just good ol' trial and error?