Is calling glFinish necessary when synchronizing resources between OpenGL contexts?
Asked Answered
S

2

6

I am using two OpenGL contexts in my application.

The first one is used to render data, the second one to background load and generate VBOs and textures.

When my loading context generates a VBO and sends it to my rendering thread, I get invalid data (all zeroes) in my VBO unless I call glFlush or glFinish after creating the VBO on the loading context.

I think that this is due to my loading context not having any buffer swap or anything to tell the GPU to start working on its command queue and doing nothing (which leads to an empty VBO on the rendering context side).

From what I've seen, this flush is not necessary on Windows (tested with an Nvidia GPU, it works even without the flushes) but is necessary on linux/macOS.

This page on Apple's documentation says that calling glFlush is necessary (https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESApplicationDesign/OpenGLESApplicationDesign.html)

If your app shares OpenGL ES objects (such as vertex buffers or textures) between multiple contexts, you should call the glFlush function to synchronize access to these resources. For example, you should call the glFlush function after loading vertex data in one context to ensure that its contents are ready to be retrieved by another context.

But is calling glFinish or glFlush necessary or is there simpler/lighter commands available to achieve the same result ? (and which is necessary, glFlush or glFinish ?)

Also, is there a documentation or reference somewhere that talks about this ? I couldn't find any mentions and it seems to work differently between implementations.

Smoot answered 17/9, 2020 at 9:2 Comment(0)
S
5

If you manipulate the contents of any object in thread A, those contents are not visible to some other thread B until two things have happened:

  1. The commands modifying the object have completed. glFlush does not complete commands; you must use glFinish or a sync object to ensure command completion.

    Note that the completion needs to be communicated to thread B, but the synchronization command has to be issued on thread A. So if thread A uses glFinish, it now must use some CPU synchronization to communicate that the thread is finished to thread B. If you use fence sync objects instead, you need to create the fence on thread A, then hand it over to thread B who can test/wait on that fence.

  2. The object must be re-bound to the context of thread B. That is, you have to bind it to that context after the commands have completed (either directly with a glBind* command or indirectly by binding a container object that has this object attached to it).

This is detailed in Chapter 5 of the OpenGL specification.

Shontashoo answered 17/9, 2020 at 13:50 Comment(5)
Thanks for the answer. From the specs, I was almost sure that I had to use glFinish but I got confused by Apple's document. If I get this right, this document is wrong ? Or is there behavior difference between OSes ?Smoot
Apple's doc seems to mention that their framework takes care of some sort of synchronization between contexts with "glFlush". "Contexts that are on different threads can share object resources. For example, it is acceptable for one context in one thread to modify a texture, and a second context in a second thread to modify the same texture. The shared object handling provided by the Apple APIs automatically protects against thread errors. And, your application is following the "one thread per context" guideline." But this is nonstandard, flush + sync seems safer.Punnet
See the WWDC talk "taking advantage of multiple gpus" (Apple has tried to scrub the videos off the internet, there's a github list somewhere with slides & links)Punnet
From that pres: opengl on mac uses flush then bind semantics. "producer context must flush (glflush or finish)". Consumer context must bind "glBindTexture". Even if they already have it bound, they must rebind. They also note that the same applies to IOSurface (which is probably how things are implemented)Punnet
See mail-archive.com/[email protected]/… for some comments on flush/bind and sync points by an apple dev. I suspect that when you create a shared opengl context on osx, their driver implicitly uses a flush/bind to set up the necessary fences (e.g. if upon a bind the tracked texture state has been marked "dirty", a glWaitSync is muxed into the command stream or something.) This works seamlessly for IOSurface so it must be implemented at a low-levelPunnet
P
0

For Apple's implementation of OpenGL in particular (and more generally IOSurfaces which appear to be how shared textures/vbo are implemented under the hood) the answer appears to be that merely glFlush (which is actually equivalent to waitUntilScheduled in metal) is sufficient. This is subtle and not properly documented:

  • First, glFlush() on apple platforms is actually closer to metal's waitUntilScheduled in that it does not merely trigger a pipeline flush (async) but actually blocks (sync) until everything is submitted to the gpu (but it doesn't wait until it finishes executing). You can read more about this in https://issues.angleproject.org/issues/40096854, https://issues.chromium.org/issues/40857406 and https://chromium-review.googlesource.com/c/angle/angle/+/3863951

  • Moreover, the kernel appears to play an active role in tracking dependencies. There is a key line from the wwdc 2010 presentation "taking advantage of multiple GPUs" where the presenter says if you don't do a glFlush() before an IOSurfaceLock() the kernel has no way of knowing how long it needs to wait before it can do the DMA. This implies that when you do

bind IOSurface
// draw
glFlush() // wait until all commands sent to GPU

and on another thread

IOSurfaceLock()

the lock call would block until the gpu finishes its work. Of course you have to use appropriate cpu-level IPC (e.g. pthread signal) so that the lock is actually only done after the glFlush() on the other thread.

From the above links

All work that has been waitUntilScheduled will be completed before an IOSurface is used by any subsequent commands.

You can also see in https://www.chromium.org/developers/design-documents/iosurface-meeting-notes/ that there is the line

rendering correctness is determined just by how the command buffers are serialized to the GPU.

which seems to imply that if you have

T1: bind, draw to FBO IOSurface, flush
T2: bind, draw FBO to screen, flush

so long as the bind of T2 is done after the flush in T1, Apple's framework takes care of maintaining things. Note that if there is a situation like

T1: bind, draw to FBO IOSurface (incomplete), flush
T2: bind, draw FBO to screen, flush
T1: draw to FBO IOSurface (rest), flush

Where T2 doesn't strictly wait until T1 finishes before it flushes, then you could have incomplete drawing. The link also seems to imply that even without CPU level sync, if you just have

T1: bind, draw to FBO IOSurface, flush
T2: bind, draw FBO to screen, flush

both going independently then you'd never get an incomplete frame (I guess if T1 flushes first then T2 will end up waiting until finish before its commands are sent to the GPU, and vice-versa) but that seems too risky to rely on. The link above this seems to be true for separate processes as well, not just threads, which is really surprising.

Punnet answered 5/7 at 0:40 Comment(2)
See also #54505084Punnet
Using an explicit sync might still be better anyhow, since it makes things more robust. See codereview.chromium.org/1273563002Punnet

© 2022 - 2024 — McMap. All rights reserved.