How to synchronize CPU and GPU using fence in DirectX / Direct3D 12?

Asked 24/10, 2019 at 10:52 Answered 18/3, 2020 at 10:50

I'm beginning learning Direct3D 12 and having difficulty in understanding CPU-GPU synchronization. As far as I understand, fence (ID3D12Fence) is no more than a UINT64(unsigned long long) value used as counter. But its methods confuse me. The below is a part of source code from D3D12 example.(https://github.com/d3dcoder/d3d12book)

void D3DApp::FlushCommandQueue()
{
    // Advance the fence value to mark commands up to this fence point.
    mCurrentFence++;

    // Add an instruction to the command queue to set a new fence point.  Because we 
    // are on the GPU timeline, the new fence point won't be set until the GPU finishes
    // processing all the commands prior to this Signal().
    ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), mCurrentFence));

    // Wait until the GPU has completed commands up to this fence point.
    if(mFence->GetCompletedValue() < mCurrentFence)
    {
        HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);

        // Fire event when GPU hits current fence.  
        ThrowIfFailed(mFence->SetEventOnCompletion(mCurrentFence, eventHandle));

        // Wait until the GPU hits current fence event is fired.
        WaitForSingleObject(eventHandle, INFINITE);
        CloseHandle(eventHandle);
    }
}

As far as I understand, this part is trying to 'Flush' the command queue, which is basically making CPU wait for GPU until it reaches to given 'Fence value' so that CPU and GPU have identical fence value.

Q. If this Signal() is a function that lets GPU to update the fence value inside given ID3D12Fence, why is that mCurrentFence value needed?

According to Microsoft Doc, it says "Updates a fence to a specified value." What specified value? What I need is "Get Last Completed Command List Value", not set or specify. What is this specified value for?

To me, it seems it has to be like

// Suppose mCurrentFence is 1 after submitting 1 command list (Index 0), and the thread reached to here for the FIRST time
ThrowIfFailed(mCommandQueue->Signal(mFence.Get()));
// At this point Fence value inside mFence is updated
if (m_Fence->GetCompletedValue() < mCurrentFence)
{
...
}

if m_Fence->GetCompletedValue() is 0,

if (0 < 1)

GPU hasn't operated the command list (Index 0), then CPU has to wait until GPU follows up. Then it makes sense calling SetEventOnCompletion, WaitForSingleObject, etc.

if (1 < 1)

GPU has completed the command list (Index 0), so CPU does not need to wait.

Increment mCurrentFence somewhere where command list is executed.

mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists);
mCurrentFence++;

Insistence answered 24/10, 2019 at 10:52 Comment(0)

mCommandQueue->Signal(mFence.Get(), mCurrentFence) sets the fence value to mCurrentFence as soon as all previously queued commands on the command queue have been executed. In this case, the "specified value" is mCurrentFence.

When you start, both, the value of the fence and mCurrentFence are set to 0. Next, mCurrentFence is set to 1. Then we do mCommandQueue->Signal(mFence.Get(), 1) which sets the fence to 1 as soon as everything was executed on that queue. Finally we call mFence->SetEventOnCompletion(1, eventHandle) followed by WaitForSingleObject to wait until the fence gets set to 1.

Replace 1 with 2 for the next iteration and so on.

Note that mCommandQueue->Signal is a nonblocking operation and does not immediately set the value of the fence, only after all other gpu commands have been executed. You can assume that m_Fence->GetCompletedValue() < mCurrentFence is always true in this example.

why is that mCurrentFence value needed?

I suppose it is not necessarily needed, but you avoid an additional API call by keeping track of the fence value this way. In this case you could also do:

// retrieve last value of the fence and increment by one (Additional API call)
auto nextFence = mFence->GetCompletedValue() + 1;
ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), nextFence));

// Wait until the GPU has completed commands up to this fence point.
if(mFence->GetCompletedValue() < nextFence)
{
    HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS);  
    ThrowIfFailed(mFence->SetEventOnCompletion(nextFence, eventHandle));
    WaitForSingleObject(eventHandle, INFINITE);
    CloseHandle(eventHandle);
}

Procambium answered 27/10, 2019 at 14:12 Comment(4)

As an approach to split submitting part and waiting part, will it be okay to code like below? – Insistence 27/10, 2019 at 14:34

void SynchronizeWithGPU() { if (mFence->GetCompletedValue() < m_nextFence) { HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS); ThrowIfFailed(mFence->SetEventOnCompletion(m_nextFence, eventHandle)); WaitForSingleObject(eventHandle, INFINITE); CloseHandle(eventHandle); } } – Insistence 27/10, 2019 at 14:36

and place the signal part near mCommandQueue->ExecuteCommandLists()? It seems this would provide more time gap until GPU executes Signal command, as the Signal is not immediately processed. – Insistence 27/10, 2019 at 14:39

It looks okay to me. – Graft 27/10, 2019 at 15:29

As complement to Felix's answer:

Keeping track of a fence value (e.g. mCurrentFence) is useful for waiting on more specific points within the command queue.

For example, say we're using this setup:

ComPtr<ID3D12CommandQueue> queue;
ComPtr<ID3D12Fence> queueFence;
UINT64 fenceVal = 0;

UINT64 incrementFence()
{
    fenceVal++;
    queue->Signal(queueFence.Get(), fenceVal); // CHECK HRESULT
    return fenceVal;
}

void waitFor(UINT64 fenceVal, DWORD timeout = INFINITE)
{
    if (queueFence->GetCompletedValue() < fenceVal)
    {
        queueFence->SetEventOnCompletion(fenceVal, fenceEv); // CHECK HRESULT
        WaitForSingleObject(fenceEv, timeout);
    }
}

Then we can do the following (pseudo):

SUBMIT COMMANDS 1
cmds1Complete = incrementFence();
    .
    . <- CPU STUFF
    .
SUBMIT COMMANDS 2
cmds2Complete = incrementFence();
    .
    . <- CPU STUFF
    .
waitFor(cmds1Complete)
    .
    . <- CPU STUFF (that needs COMMANDS 1 to be complete,
      but COMMANDS 2 is NOT required to be completed [but also could be])
    .
waitFor(cmds2Complete)
    .
    . <- EVERYTHING COMPLETE
    .

Since we keep track of fenceVal we can also have a flush function which just waits for the tracked fenceVal (as opposed to the value returned from incrementFence), this is essentially what you have in FlushCommandQueue since it inlines the signal, it will always be the most recent value (which is why as Felix said, it just saves an API call):

void flushCmdQueue()
{
    waitFor(incrementFence());
}

This example is somewhat more complex then the initial issue, however, I think it's important when asking about the tracking of mCurrentFence.

Abranchiate answered 18/3, 2020 at 10:50 Comment(0)

Recommended topics

Hot tags