how to do a blocking wait for a compute shader with Direct3D11?
Asked Answered
W

2

5

I have a post processing pipeline that uses a compute shader to process a texture and writes it to a RWByteAddressBuffer.

The content of the RWByteAddressBuffer is then sent to an FPGA device via direct memory access (AMD DirectGMA technology). Meaning, I initiate an external device to access the physical bytes of this buffer without Direct3D API knowing about it.

Here is the essence of the code:

_context->CSSetShaderResources(0,1,_nonMsaaSrv.GetAddressOf());
_context->CSSetUnorderedAccessViews(0, 1, _unorderedAccessView.GetAddressOf(),nullptr);
_context->CSSetShader(_converter.Get(),0,0);
_context->Dispatch(1920, 1200, 1);

// ... wait for direct3d compute shader to finish processing?
// send the bytes to the fpga:
_dmaController->StartDMA(_d3dBufferPhysicalAddress, fpgaLogicalAddress);

Everything works, but the problem is I could not find a way to block the thread or get an event that indicates that the compute shader completed its work on the GPU.

This question suggests a solution that uses ID3D11Query to do some kind of polling. but it is my understanding that this is simply a busy wait. I was hoping to find a better solution that might allow the thread to block by waiting for some kind of event. With APIs such as Cuda / OpenCL this is pretty trivial.

So is it possible to do a blocking wait for a compute shader in direct3D 11? If so how?

Whited answered 5/3, 2019 at 14:45 Comment(0)
W
5

If there is no need to support Windows 7 / 8, it is possible to achieve this using the updated interfaces ID3D11Device5, ID3D11DeviceContext4 & ID3D11Fence that are available on Windows 10 v1703 and later.

Creating the fence object:

HR(_d3dDevice->CreateFence(0, D3D11_FENCE_FLAG_NONE, __uuidof(ID3D11Fence), reinterpret_cast<void**>(_syncFence.GetAddressOf())));

In the processing loop, we dispatch the compute shader, and enqueue a signal with incremented counter right after it:

++_syncCounter;
_context->Dispatch(1920, 1200, 1);
HR(_context->Signal(_syncFence.Get(), _syncCounter));
HR(_syncFence->SetEventOnCompletion(_syncCounter,_syncEvent.get()));  

// wait for the event (could be on a different thread)

_syncEvent.wait(); // WaitForSingleObject

Examples (for Direct3D12 though) can be found here.

Whited answered 7/3, 2019 at 10:28 Comment(1)
As a clarification, this is only available on ID3D11Device5 (Windows 10 v1703 and later) so it'll only work for applications that don't need to target earlier versions of Windows. Can you update your answer for future readers?Charin
C
1

The ID3D11Query is the mechanism you're looking for; there's not anything event-based in Direct3D 11. It's a polling mechanism but not the same as a normal busy wait on the CPU.

You can always profile it to see what load it adds, especially if you add a delay to check query->GetData at various intervals (10ms, 100ms, etc) to see if your performance improves.

Charin answered 6/3, 2019 at 22:5 Comment(2)
Thanks, but I think there is a better way. see my posted answer.Whited
This is good to know. It's only available on ID3D11Device5 (Windows 10 v1703 and later) so it won't work for the applications I work on at least until we no longer support Windows 7 or 8.Charin

© 2022 - 2024 — McMap. All rights reserved.