In DX12 what Ordering Guarantees do multiple ExecuteCommandLists calls provide?

Assuming a single threaded application. If you call ExecuteCommandLists twice (A and B). Is A guaranteed to execute all of its commands on the GPU before starting any of the commands from B? The closest thing I can find in the documentation is this, but it doesn't really seem to guarantee A finishes before B starts:

Applications can submit command lists to any command queue from multiple threads. The runtime will perform the work of serializing these requests in the order of submission.

As a point of comparison, I know that this is explicitly not guarenteed in Vulkan:

vkQueueSubmit is a queue submission command, with each batch defined by an element of pSubmits as an instance of the VkSubmitInfo structure. Batches begin execution in the order they appear in pSubmits, but may complete out of order.

However, I'm not sure if DX12 works the same way.

Frank Luna's book says:

The command lists are executed in order starting with the first array element

However in that context he's talking about calling ExecuteCommandLists once with two command lists (C and D). Do these operate the same as two individual calls? My colleague argues that this still only guarantees that they are started in order, not that C finishes before D starts.

Is there more clear documentation somewhere I'm missing?

I asked the same question in the Direct X forums, here's an answer from Microsoft engineer Jesse Natalie:

Calling ExecuteCommandLists twice guarantees that the first workload (A) finishes before the second workload (B). Calling ExecuteCommandLists with two command lists allows the driver to merge the two command lists such that the second command list (D) may begin executing work before all work from the first (C) has finished.

Specifically, the application is allowed to insert a fence signal or wait between A and B, and the driver has no visibility into this, so the driver must ensure that everything in A is complete before the fence operation. There is no such opportunity in a single call to the API, so the driver can optimize that scenario.

Source: http://forums.directxtech.com/index.php?topic=5975.0

Recommended topics

Hot tags