However, there is nowhere that OpenGL spec mentioned "threads executed in lockstep". It only mentioned "The relative order of invocations of the same shader type are undefined.".
You say this as if the wording of the GL spec would not cover the "lockstep" situation. But "The relative order of invocations of the same shader type are undefined." actually covers that. Given two shader invocations A and B, this statement means that you must not assume any of the following:
- that A is executed before B
- that B is executed before A
- that A and B are executed in parallel
- that A and B are not executed in parallel
- that parts of A are executed before the same or other parts of B
- that parts of B are exectued before the same or other parts of A
- that parts of A and B are executed in parallel
- that parts of A and B are not executed in parallel
- ... (probably a lot more) ...
The undefined order means you can never wait on the results of another invocation because there is no guarantee that this result of the other invocation can be exectued before the wait, except in situations where the GL spec makes certain extra guarantees, i.e:
- when using explicit synchronization mechanisms like
barrier()
- there are some weak ordering guarantees between different shader stages
(I.e. it is allowed to assume that all vertex shader invoations have already happened when processing a fragment for that very primitive.)
For example, the GLSL Spec, Version 4.60 explains the concept of "invocation groups" in section 8.18:
Implementations of the OpenGL Shading Language may optionally group multiple shader
invocations for a single shader stage into a single SIMD invocation group, where invocations are
assigned to groups in an undefined implementation-dependent manner.
and the accompanying GL 4.6 core profie spec defines "invocation groups" in section 7.9 as
An invocation group [...] for a compute shader is the set of
invocations in a single work group. For graphics shaders, an invocation group is
an implementation-dependent subset of the set of shader invocations of a given
shader stage which are produced by a single drawing command. For MultiDraw*
commands with drawcount
greater than one, invocations from separate draws are
in distinct invocation groups.
So besides for compute shaders, the GL gives you only draw-call-granularity other the invocation groups. This section of the spec also has a following footnote to make this absolutely clear:
Because the partitioning of invocations into invocation groups is implementation-dependent
and not observable, applications generally need to assume the worst case of all invocations in a draw belong to a single invocation group.
So besides that stronger statement about undefined relative invocation order, the spec also covers the "in-lockstep" SIMD processsing, and makes it very clear that you have not much control about it in the graphics pipeline.
lock_available != 0
. And ifimageStore
is changed toimageAtomicExchange(lock_image, pos, 0)
, it still results in APPCRASH. – Danonorwegian