How to use atomic operations on an SSBO in a compute shader
Asked Answered
L

1

8

Example code

Here is a bare-bones compute shader to illustrate my question

layout(local_size_x = 64) in;

// Persistent LIFO structure with a count of elements
layout(std430, binding = 0) restrict buffer SMyBuffer
{
    int count;
    float data[];
} MyBuffer;

bool AddDataElement(uint i);
float ComputeDataElement(uint i);

void main()
{
    for (uint i = gl_GlobalInvocationID.x; i < some_end_condition; i += gl_WorkGroupSize.x)
    {
        if (AddDataElement(i))
        {
            // We want to store this data piece in the next available free space
            uint dataIndex = atomicAdd(MyBuffer.count, 1);
            // [1] memoryBarrierBuffer() ?
            MyBuffer.data[dataIndex] = ComputeDataElement(i);
        }
    }
}

Explanation

SMyBuffer is a stack of elements (data[]) with a count of the current number of elements. When a certain condition is met, the compute shader increments the count atomically. This operation returns the previous index which is used to index data[] to store the new element. This guarantees that no two shader invocations overwrite each other's elements.

Another compute shader eventually pops values from this stack and uses them. glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) is of course required between the two compute shader dispatches.

Question

All this works fine but I'm wondering if I'm just being lucky with timings and I want to validate my usage of the API.

So, is anything else required to make sure that the counter stored in the SSBO works (see 1)? I'm expecting atomicAdd() takes care of the memory synchronization because otherwise it makes little sense. What would be the point of an atomic operation whose effect is only visible in a single thread?

Regarding memory barriers, the OpenGL wiki states:

Note that atomic counters are different functionally from atomic image/buffer variable operations. The latter still need coherent qualifiers, barriers, and the like.

which leaves me wondering whether there's something I haven't understood properly and a memoryBarrierBuffer() is actually required. But then if that is the case, what's to stop 2 threads from performing atomicAdd() before one of them gets to the subsequent memoryBarrierBuffer()?

Also, does the answer change whether glDispatchCompute() dispatches a single workgroup or more?

Lonilonier answered 5/12, 2017 at 20:17 Comment(2)
An interesting related question: #17430943Lonilonier
Another one: #56340833Lonilonier
S
1

You do not need the memoryBarrierBuffer() call as the glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) will stall any reads that your second shader (the consumer) issues until all the writes from your first shader are completed.
The number of workgroups dispatched does not change the answer as all the writes from the glDispatchCompute() will need to finish.

Sackbut answered 15/6, 2023 at 18:9 Comment(2)
Thanks for your answer. My question revolves around competing atomic read-write operations in the first shader from parallel shader executions of it. The glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) does indeed take care of insuring that the second shader waits for all writes from the first shader to finish.Lonilonier
atomicAdd also takes care of the barriers required to make the change visible to any other thread in the system.Sackbut

© 2022 - 2025 — McMap. All rights reserved.