Avoid waiting on SwapBuffers

Asked 29/4, 2011 at 8:39 Answered 14/11, 2022 at 17:39

I have discovered that SwapBuffers in OpenGL will busy-wait as long as the graphics card isn't done with its rendering or if it's waiting on V-Sync.

This is a problem for me because I don't want to waste 100% of a CPU core while just waiting for the card to be finished. I'm not writing a game, so I can not use the CPU cycles for anything more productive, I just want to yield them to some other process in the operating system.

I've found callback-functions such as glutTimerFunc and glutIdleFunc that could work for me, but I don't want to use glut. Still, glut must in some way use the normal gl functions to do this, right?

Is there any function such as "glReadyToSwap" or similar? In that case I could check that every millisecond or so and determine if I should wait a while longer or do the swap. I could also imagine perhaps skip SwapBuffers and write my own similar function that doesn't busy-wait if someone could point me in the right direction.

Branca answered 29/4, 2011 at 8:39 Comment(0)

SwapBuffers is not busy waiting, it just blocks your thread in the driver context, which makes Windows calculating the CPU usage wrongly: Windows calculates the CPU usage by determining how much CPU time the idle process gets + how much time programs don't spend in driver context. SwapBuffers will block in driver context and your program obviously takes away that CPU time from the idle process. But your CPU is doing literally nothing in the time, the scheduler happily waiting to pass the time to other processes. The idle process OTOH does nothing else than immediately yield its time to the rest of the system, so the scheduler jumps right back into your process, which blocks in the driver what Windows counts as "is clogging CPU". If you'd measure the actual power consumption or heat output, for a simple OpenGL program this will stay rather low.

This irritating behaviour is actually an OpenGL FAQ!

Just create additional threads for parallel data processing. Keep OpenGL in one thread, the data processing in the other. If you want to get down the reported CPU usage, adding a Sleep(0) or Sleep(1) after SwapBuffers will do the trick. The Sleep(1) will make your process spend blocking a little time in user context, so the idle process gets more time, which will even out the numbers. If you don't want to sleep, you may do the following:

const float time_margin = ... // some margin
float display_refresh_period; // something like 1./60. or so.

void render(){

    float rendertime_start = get_time();

    render_scene();
    glFinish();

    float rendertime_finish = get_time();
    float time_to_finish = rendertime_finish - rendertime_start;

    float time_rest = fmod(render_finish - time_margin, display_refresh_period);
    sleep(time_rest);
    SwapBuffers();
 }

In my programs I use this kind of timing but for another reason: I let SwapBuffers block without any helper Sleeps, however I give some other worker threads about that time to do stuff on the GPU through shared context (like updating textures) and I have the garbage collector running. It's not really neccesary to exactly time it, but the worker threads being finished just before SwapBuffers returns allows one to start rendering the next frame almost immediately since most mutexes are already unlocked then.

Emelyemelyne answered 29/4, 2011 at 8:44 Comment(24)

Well the specifics for how the waiting occurs is not that important to me. What I can say is that the fans in the computer speed up, and I assume they react on heat, meaning the CPU DOES run at 100% as it DOES get hot. Even if it didn't and it's just Windows thinking it runs at 100% that still means Windows won't give any CPU time to any other process because it thinks it's already used 100%! However you look at it, you waste CPU time that could be used by something else either by burning it away or idling it. – Branca 29/4, 2011 at 9:5

Did you try running a different program that also consumes a lot of CPU time? – Emelyemelyne 29/4, 2011 at 9:7

^^ Together with your OpenGL program I meant. – Emelyemelyne 29/4, 2011 at 9:16

I think I have enough computer knowledge to know when a CPU runs flat out. What I think happens in your program is 1) render, 2) get stuck on glFinish until the card is done 3) sleep away the rest of 1/60th second 4) SwapBuffers. However in #2 you WILL busy-wait because as far as I know glFinish is platform-independent and can not use any other methods than busy-waiting for halting execution. In a game or similar this is not that important, you can do useful stuff between #1 and #2, but in a normal windows program you don't want to burn away 100% CPU on one core just to show some 3D image. – Branca 29/4, 2011 at 9:23

Should busy wait loops be power intensive? ALU, FPUs etc are all not involved. Busy waiting is wasteful, but it shouldn't be particularly hot. – Municipalize 29/4, 2011 at 9:24

And you WILL burn away 100% cpu as soon as the rendering time of the scene exceeds 1/60th second (ie goes below 60 FPS) – Branca 29/4, 2011 at 9:25

If I put it this way; have any of you ever used a 3D (CAD, painting, point handling) program in windows that constantly run 100% of one core flat out? I sure haven't, which leads me to believe there IS some way to avoid that. – Branca 29/4, 2011 at 9:28

In which way makes the plattform independency of glFinish this this busy wait? glFinish is just an API call into driver context. OpenGL is actually not a library, despite the 'L' in it's name. Yes it can be implemented as a library, but in most cases the small library you link is just a trampoline into driver code. glFinish is implemented there, usually in form of a wait on a Mutex which is released by the GPU main driver when the V-Sync interrupt arrives. Also CAD programs don't constantly redraw, since they don't play animation. Redraw happens there due to user interaction. – Emelyemelyne 29/4, 2011 at 9:31

Well I can safely say I'm not experienced enough to know exactly how all this works. The only thing I can take away from this seems to be "tough luck", there is no way to poll the status of the rendering, at least none that anyone here has mentioned. – Branca 29/4, 2011 at 9:45

Another thing about glFinish(): glFinish() blocks the program until all pending OpenGL rendering requests have been processed. On Windows this is implemented in the driver. On GLX glFinish waits for the particular event sent by the X11 server, that indicated completion; this is essentially socket programming, events are waited for by select syscall on the socket and the program will only use CPU time if there are events to be read from the socket. There may be other events recieved, they are queued for later processing – glFinish on GLX is event driven! – Emelyemelyne 29/4, 2011 at 9:45

@DaedalusAlpha: Because there is no such method. You can call glFinish() which will block until rendering is done – and don't worry the 100% CPU usage reported are not actually consumed, all the other processes recieve their CPU time just fine. It's really just a mistake in which Windows reports CPU usage, not how the scheduler assigns it. The same goes for SwapBuffers, which BTW does an implicit glFinish, too. – Emelyemelyne 29/4, 2011 at 9:47

Well it's a bit annoying to listen to the fans going flat out and it feels so wrong, but if there is no other way... I did find something called SyncObjects however, maybe that is what I want? opengl.org/wiki/Sync_Objects on the sync objects I can wait with 0 as a timeout, effectively polling the status without getting stuck. – Branca 29/4, 2011 at 9:54

I think your fans spin up not due to waiting in SwapBuffers. After all the CPU also has to do stuff to send commands to the GPU. So far you never showed us any actual drawing code. If you're using immediate mode (glBegin(...); for(i 0...n) glVertex(vertex[i]); glEnd(); you'll use OpenGL very inefficiently and use a lot of CPU time doing all those function calls. – Emelyemelyne 29/4, 2011 at 10:5

I use glDrawArrays with VBO buffers, so everything should happen GPU-side, and I have observed that all delays happen at the SwapBuffers function, not the render function. I can accept that SwapBuffers is blocking and I don't really care how it's blocking, if I just can get some way of going around it and only call it when there actually is something to swap. – Branca 29/4, 2011 at 11:9

@DaedalusAlpha: The blocking of SwapBuffers mostly stems from waiting for the V-Sync to happen. You can disable this (i.e. the swap will happen immediately after the implied glFinish) in your GPU driver's options. This also leads to an increase in framerate, since the display refresh will no longer put a delay on it — not V-Syncing is also the operation mode in which the CPU is put under a much larger load. Also it's not all happening on the GPU; especially very large VBO batches are chopped into pieces by the driver. I found batches of about 1k to 2k vertices working best. – Emelyemelyne 29/4, 2011 at 11:26

Well I appreciate all answers and the intricacies of SwapBuffer and all, but what I really wanted to know was "is there any way to determine if SwapBuffer/glFinish is going to block or not without actually calling them?" – Branca 29/4, 2011 at 11:43

@DaedalusAlpha: You could use Sync_Objects (or NV_fence if available) to test if glFinish would block. If there's no V-Sync enables and you used a Sync_Object then SwapBuffers will return almost immediately, otherwise it would block until the V-Sync. However there's really very little benefit doing that way; it won't save and CPU time. Sync_Objects have been introduced to that one can memory map buffer objects that had been used previously, without having to wait for the GPU to yield the buffer object. – Emelyemelyne 29/4, 2011 at 11:56

Does this Sleep workaround take into consideration SwapBuffers execution time. Because on GTX 560ti it runs less than millisecond but on Intel HD 3000 1 - 2 seconds. Also this doesnt really feels good to use sleeps because VSYNC should take about this. Reported this 100% CPU usage while waiting sync to NVIDIA. Seems like this is only NVIDIA problem (no 100% cpu on ati or intel hd). But doesnt seem that they care about it. – Repel 18/4, 2013 at 13:44

@Demion: Like I already explained those 100% CPU usage are merely a calculation error, because SwapBuffers blocks in kernel mode, which is accounted for as 100% CPU utilization. SwapBuffers will return only after V-Sync happend, so if your display has a refresh rate of 60Hz it will block for up to 16ms depending on the exact time within a refresh period the call was done. 0ms execution time however should only be reported if VSync has been disabled. BUT NVidia recently introduced some trickery into their drivers that dynamically changes VSync swap buffers behavior which might explainsthis. – Emelyemelyne 18/4, 2013 at 14:59

I understand this. But to avoid Windows reporting 100% CPU usage you should waste this CPU time in user code, for example with Sleep. For example refresh rate is 60hz so this is 16 ms per frame. We can calculate how long are OpenGL drawing functions excecuted and then Sleep(16ms - renderTime). But the problem is that on slow computer SwapBuffers itself may consume some time (even without vsync). So if we render 4ms and then Sleep 16ms - 4ms = 12ms and SwapBuffers execution time = 2ms then frame will be 18ms (4ms - render, 12ms - sleep, 2ms - swapbuffers) not 16ms and this will reduce fps. – Repel 18/4, 2013 at 16:1

@Demion: Sleep doesn't waste CPU time. It will yield the CPU time to another thread or process in wait state. This yielding of CPU time to another process is all we're interested in. Technically a Sleep(0); would do the trick. There's also a syscall Yield() in the Win32 API, which should have the same effect, but usually has no influence in CPU usage calculation. Anyway, the CPU utilization shown is wrong and no Sleep is actually required. Other processes to receive their CPU time just fine without those tricks. It's simply a wrongly displayed value that doesn't represent the actual. – Emelyemelyne 18/4, 2013 at 16:9

I understood already. Thats this is CPU usage shown wrong. And I said wrong about Sleep wastes CPU time. I meant for user not see 100% CPU usage we can use Sleep workaround. But it is hard to calculate exact Sleep perioud because on slow computers (my netbook for example) not only render (gl commands) but Swapbuffers itself consume 1-2ms to execture. Also Sleep(0) changes nothing in my case. Still 100% CPU shown. – Repel 18/4, 2013 at 16:12

In other words I just need workaround to user not see this fake 100% CPU usage. I thought about Sleep(16ms - renderTime); But this will work only on powerfull computers where SwapBuffers functions executes less than 1ms. But on slow netbooks this will couse fps reduce because SwapBuffers itself consume more than 1ms (variable 1-2 ms in my case) – Repel 18/4, 2013 at 16:21

I'm pretty sure this entire answer is applicable to only a particular version of a particular vendor's driver (which would be why the asker experienced different behavior) – Catheycathi 3/7, 2013 at 19:31

the popular answer here is wrong. windows is not reporting the cpu usage "wrongly", lol. opengl, with vsync on, even while rendering a blank screen is actually burning 100% of 1 thread of your cpu. (you can check your CPU temps)

but the solution is simple. just call DwmFlush(); before or after SwapBuffers

Kayekayla answered 14/11, 2022 at 17:39 Comment(0)

Though eglSwapBuffers does not busy wait a legitimate use for a nonblocking eglSwapBuffers is to have a more responsive GUI thread that can listen to user input or exit signals instead of waiting for OpenGL to finish swapping buffers. I have a solution to half of this problem. First in your main loop you buffer up your OpenGL commands to execute on your swapped out buffer. Then you poll on a sync object to see if your commands have finished executing on your swapped out buffer. Then you can swap buffers if the commands have finished executing. Unfortunately, this solution only asynchronously waits for commands to finish executing on your swapped out buffer and does not asynchronously wait for vsync. Here is the code:

 void process_gpu_stuff(struct gpu_context *gpu_context)
 {
     int errnum = 0;

     switch (gpu_context->state) {
     case BUFFER_COMMANDS:
         glDeleteSync(gpu_context->sync_object);
         gpu_context->sync_object = 0;

         real_draw(gpu_context);
         glFlush();

         gpu_context->sync_object = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
         if (0 == gpu_context->sync_object) {
             errnum = get_gl_error();
             break;
         }
         gpu_context->state = SWAP_BUFFERS;
         break;

     case SWAP_BUFFERS:
         /* Poll to see if the buffer is ready for swapping, if
          * it is not in ready we can listen for updates in the
          * meanwhile. */
         switch (glClientWaitSync(gpu_context->sync_object, 0, 1000U)) {
         case GL_ALREADY_SIGNALED:
         case GL_CONDITION_SATISFIED:
             if (EGL_FALSE == eglSwapBuffers(display, surface)) {
                 errnum = get_egl_error();
                 break;
             }
             gpu_context->state = BUFFER_COMMANDS;
             break;

         case GL_TIMEOUT_EXPIRED:
             /* Do nothing. */
             break;

         case GL_WAIT_FAILED:
             errnum = get_gl_error();
             break;
         }
         break;
     }
 }

Subsistent answered 4/12, 2014 at 22:37 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags