Why does vkAcquireNextImageKHR() never block my thread?
Asked Answered
C

1

9

I am using Vulkan graphics API (via BGFX) to render. And I have been measuring how much (wall-clock) time my calls take.

What I do not understand is that vkAcquireNextImageKHR() is always fast, and never blocks. Even though I disable the time-out and use a semaphore to wait for presentation.

The presentation is locked to a 60Hz display rate, and I see my main-loop indeed run at 16.6 or 33.3 ms.

Shouldn't I see the wait-time for this display rate show up in the length of the vkAcquireNextImageKHR() call?

The profiler measures this call as 0.2ms or so, and never a substantial part of a frame.

VkResult result = vkAcquireNextImageKHR(
    m_device
  , m_swapchain
  , UINT64_MAX
  , renderWait
  , VK_NULL_HANDLE
  , &m_backBufferColorIdx
);

Target hardware is a handheld console.

Catechu answered 26/2, 2020 at 17:53 Comment(0)
G
17

The whole purpose of Vulkan is to alleviate CPU bottlenecks. Making the CPU stop until the GPU is ready for something would be the opposite of that. Especially if the CPU itself isn't actually going to use the result of this operation.

As such, all the vkAcquireNextImageKHR function does is let you know which image in the swap chain will be ready to use next. The Vulkan term for this is "available". This is the minimum that needs to happen in order for you to be able to use that image (for example, by building command buffers that reference the image in some way). However, an image being "available" doesn't mean that it is ready for use.

This is why this function requires you to provide a semaphore and/or a fence. These will be signaled when the image can actually be used, and the image cannot be used in a batch of work submitted to the GPU (despite being "available") until these are signaled. You can build the command buffers that use the image, but if you submit those command buffers, you have to ensure that the commands that use them wait on the synchronization.

If the process which consumes the image is just a bunch of commands in a command buffer (ie: something you submit with vkQueueSubmit), you can simply have that batch of work wait on the semaphore given to the acquire operation. That means all of the waiting happens in the GPU. Where it belongs.

The fence is there if you (for some reason) want the CPU to be able to wait until the acquired image is ready for use. But Vulkan, as an explicit, low-level API, forces you to explicitly say that this is what you want (and it almost never is what you want).

Because "available" is a much more loose definition than "ready for use", the GPU doesn't have to actually be done with the image. The system only needs to figure out which image it will be done with next. So any CPU waiting that needs to happen is minimized.

Germinal answered 26/2, 2020 at 18:19 Comment(14)
Thanks. I misread the docs, the semaphore will be signaled, not waited upon. So in that case, I expect my block to show up in vkQueueSubmit() that uses that semaphore in pWaitSemaphores but strangely, it is not showing up there, either. I see it in a vkQueueWaitIdle() instead.Catechu
@Bram: Queue submit operations don't wait on semaphores; the GPU waits on semaphores. That's why they're GPU constructs and not POSIX mutexes or somesuch. GPU wait operations shouldn't force CPU waits. Also, you should basically never call vkQueueWaitIdle.Germinal
One tricky bit to consider is that if you only ever use semaphores, then the Vulkan driver will happily let you queue up an arbitrary number of frame submissions. So your display could be on frame 5 and your CPU is working on generating frame 900. That's the value in associating a fence with each image, but not checking the fence until that image comes around again from the acquire.Andalusite
@NicolBolas Ah, thanks. You have been very helpful. So, yeah. BGFX uses vkQueueWaitIdle() a lot, every time it switches to a new VkFramebuffer. I tried taking those waits out, but that causes GPU crashes. github.com/bkaradzic/bgfx/blob/…Catechu
@Jherico: It's pretty much impossible to render a non-static scene without, at some point, doing a GPU/CPU sync. If you change the contents of memory or need to alter a descriptor set, you're going to have to prevent changing memory/sets that the GPU is using. And that requires a sync. But it only happens as you need it and where you need it, rather than being a built-in part of some API function.Germinal
@Bram: "I tried taking those waits out, but that causes GPU crashes." If they're relying on that for synchronizing other things too (like memory/descriptor accesses), then just yanking them out won't work. My point is that in a well-constructed application, the CPU should never be waiting for a queue to idle, as that represents losing GPU performance. The CPU may wait for a particular submission to complete, but that would be a wait on a fence, not for a queue to stop doing anything.Germinal
@NicolBolas granted, but most of my direct experience has been working with Vulkan sample code where it's very easy to create an entire (toy) application that has no per-frame sync points.Andalusite
This makes sense but I couldn't figure out by myself that vkAcquireNextImageKHR does not block. Should I assume that API calls don't block CPU unless explicitly specified? What really confuses my is the timeout parameter...Crinoline
One of the possible return codes for vkAcquireNextImageKHR is VK_TIMEOUT, which doesn't make sense to me unless the function can indeed block CPU execution :SCrinoline
@tuket: It does not block until the next image is available. But it does have to block until the display engine can figure out what the next image actually will be.Germinal
Actually, both vkWaitSemaphores and vkWaitForFences have a timeout parameter so I'm not sure what is the point of having another one.Crinoline
@NicolBolas Ah, makes sense, thank you. I have found the documentation that talks about this "If the specified timeout period expires before an image is acquired, vkAcquireNextImageKHR returns VK_TIMEOUT" "The presentation engine may not have finished reading from the image at the time it is acquired, so the application must use semaphore and/or fence to ensure..."Crinoline
Misleading answer. When the timeout parameter is non-zero, vkAcquireNextImageKHR() does block on CPU side until an image becomes available. This behavior is documented on VK_KHR_swapchain man page. The OP is probably waiting for other fences, which caps the main loop to the 60Hz display rate. So by the time vkAcquireNextImageKHR() is called, the image is already available and it returns immediately.Tamtama
@ValentinMilea: I've adjusted the answer to use more correct Vulkan terminology.Germinal

© 2022 - 2024 — McMap. All rights reserved.