Why does vkAcquireNextImageKHR() never block my thread?

I am using Vulkan graphics API (via BGFX) to render. And I have been measuring how much (wall-clock) time my calls take.

What I do not understand is that vkAcquireNextImageKHR() is always fast, and never blocks. Even though I disable the time-out and use a semaphore to wait for presentation.

The presentation is locked to a 60Hz display rate, and I see my main-loop indeed run at 16.6 or 33.3 ms.

Shouldn't I see the wait-time for this display rate show up in the length of the vkAcquireNextImageKHR() call?

The profiler measures this call as 0.2ms or so, and never a substantial part of a frame.

VkResult result = vkAcquireNextImageKHR(
    m_device
  , m_swapchain
  , UINT64_MAX
  , renderWait
  , VK_NULL_HANDLE
  , &m_backBufferColorIdx
);

Target hardware is a handheld console.

The whole purpose of Vulkan is to alleviate CPU bottlenecks. Making the CPU stop until the GPU is ready for something would be the opposite of that. Especially if the CPU itself isn't actually going to use the result of this operation.

As such, all the vkAcquireNextImageKHR function does is let you know which image in the swap chain will be ready to use next. The Vulkan term for this is "available". This is the minimum that needs to happen in order for you to be able to use that image (for example, by building command buffers that reference the image in some way). However, an image being "available" doesn't mean that it is ready for use.

This is why this function requires you to provide a semaphore and/or a fence. These will be signaled when the image can actually be used, and the image cannot be used in a batch of work submitted to the GPU (despite being "available") until these are signaled. You can build the command buffers that use the image, but if you submit those command buffers, you have to ensure that the commands that use them wait on the synchronization.

If the process which consumes the image is just a bunch of commands in a command buffer (ie: something you submit with vkQueueSubmit), you can simply have that batch of work wait on the semaphore given to the acquire operation. That means all of the waiting happens in the GPU. Where it belongs.

The fence is there if you (for some reason) want the CPU to be able to wait until the acquired image is ready for use. But Vulkan, as an explicit, low-level API, forces you to explicitly say that this is what you want (and it almost never is what you want).

Because "available" is a much more loose definition than "ready for use", the GPU doesn't have to actually be done with the image. The system only needs to figure out which image it will be done with next. So any CPU waiting that needs to happen is minimized.

Recommended topics

Hot tags