I've been reading a lot about multi-threaded rendering. People have been proposing all kinds of weird and wonderful schemes for submitting work to the GPU with threads in order to speed up their frame rates and get more stuff rendered, but I'm having a bit of a conceptual problem with the whole thing and I thought I'd run it by the gurus here to see what you think.
As far as I know, the basic unit of concurrency on a GPU is the Warp. That is to say, it's down at the pixel level rather than higher up at the geometry submission level. So given that the unit of concurrency on the GPU is the warp, the driver must be locked down pretty tightly with mutexes to prevent multiple threads screwing up each other's submissions. If this is the case, I don't see where the benefit is of coding to D3D or OpenGL multi-threading primitives.
Surely the most efficient method of using your GPU in a multi-threading scenario is at the higher, abstract level, where you're collecting together batches of work to do, before submitting it? I mean rather than randomly interleving commands from multiple threads, I would have thought a single block accepting work from multiple threads, but with a little intelligence inside of it to make sure things are ordered for better performance before being submitted to the renderer, would be a much bigger gain if you wanted to work with multiple threads.
So, whither D3D/OpenGL multi-threaded rendering support in the actual API?
Help me with my confusion!