glGetBufferSubData and glMapBufferRange for GL_SHADER_STORAGE_BUFFER very slow on NVIDIA GTX960M
Asked Answered
L

1

7

I've been having some issues with transfering a GPU buffer into CPU for performing sorting operations. The buffer is a GL_SHADER_STORAGE_BUFFER composed of 300.000 float values. The transfer operation with glGetBufferSubData is taking around 10ms, and with glMapBufferRange, it takes more than 100 ms.

The code Im using is the following:

std::vector<GLfloat> viewRow;
unsigned int viewRowBuffer = -1;
int length = -1;

void bindRowBuffer(unsigned int buffer){
    glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer);
    glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, buffer);
}

void initRowBuffer(unsigned int &buffer, std::vector<GLfloat> &row, int lengthIn){
    // Generate and initialize buffer
    length = lengthIn;
    row.resize(length);
    memset(&row[0], 0, length*sizeof(float));
    glGenBuffers(1, &buffer);
    bindRowBuffer(buffer);
    glBufferStorage(GL_SHADER_STORAGE_BUFFER, row.size() * sizeof(float), &row[0], GL_DYNAMIC_STORAGE_BIT | GL_MAP_READ_BIT | GL_MAP_WRITE_BIT);

    glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}

void cleanRowBuffer(unsigned int buffer) {
    float zero = 0.0;
    glClearNamedBufferData(buffer, GL_R32F, GL_RED, GL_FLOAT, &zero);
}

void readGPUbuffer(unsigned int buffer, std::vector<GLfloat> &row) {
    glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0,length *sizeof(float),&row[0]);
    glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}

void readGPUMapBuffer(unsigned int buffer, std::vector<GLfloat> &row) {
    float* data = (float*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, length*sizeof(float), GL_MAP_READ_BIT); glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
     memcpy(&row[0], data, length *sizeof(float));
    glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}

The main is doing:

    bindRowBuffer(viewRowBuffer);
    cleanRowBuffer(viewRowBuffer);
    countPixs.bind();
    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_2D, gPatch);
    countPixs.setInt("gPatch", 0);
    countPixs.run(SCR_WIDTH/8, SCR_HEIGHT/8, 1);
    countPixs.unbind();
    readGPUbuffer(viewRowBuffer, viewRow);

Where countPixs is a compute shader, but I'm possitive the problem is not there because if I comment the run command, the read takes exactly the same amount of time.

The weird thing is that if I execute a getbuffer of only 1 float:

glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0, 1 *sizeof(float),&row[0]);

It takes exactly the same time... so I'm guessing there is something wrong all-the-way... maybe related to the GL_SHADER_STORAGE_BUFFER?

Linnette answered 30/5, 2020 at 23:27 Comment(3)
Do you need write access to teh data or just read only?Altair
read only is okLinnette
A simpler way to achieve what you want at faster speeds might be to use a 1D texture array instead of an SSBOAltair
H
3

This is likely to be an GPU-CPU synchronization/round trip caused delay. I.e., once you map your buffer, the previous GL command(s) that touched the buffer have to be completed immediately, causing a pipeline stall. Note that drivers are lazy: it is very probable GL commands have not even started executing yet.

If you can: glBufferStorage(..., GL_MAP_PERSISTENT_BIT) and map the buffer persistently. This avoids completely re-mapping and allocating any GPU memory, and you can keep the mapped pointer over draw calls with some caveats:

  • You likely also need GPU fences to detect/wait when the data is actually available from GPU. (Unless you like reading garbage.)
  • The mapped buffer can't be resized. (since you already use glBufferStorage() you are ok)
  • It is probably good idea to combine GL_MAP_PERSISTENT_BIT with GL_MAP_COHERENT_BIT

After reading GL 4.5 docs bit more I found out that glFenceSync is mandatory in order to guarantee the data has arrived from the GPU, even with GL_MAP_COHERENT_BIT:

If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.

Hipparch answered 5/6, 2020 at 7:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.