Working around WebGL readPixels being slow
Asked Answered
B

3

11

I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.

As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:

Circuit animation

The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).

What are some tricks for speeding up readPixels in this kind of use case?

  • Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
  • Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
  • Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
  • Should I be using a different method to get the information back out of the textures?
  • Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Bebel answered 2/2, 2015 at 17:7 Comment(3)
readPixels performance on jsperf. How many bytes are read back in total per call? How many values do you need in JS land to visualize stuff?Gamp
@StefanHanke Your link indicates the call should be going orders of magnitudes faster than I'm measuring. As a webgl newbie, I may be accidentally slowing it down due to poor configuration. The textures for an N-wire circuit are 2^(n/2)x2^(n/2). For 4 wires it's a negligible 4x4 and should be running at 100KHz instead of 100Hz apparently.Bebel
You may try texImage2D or texSubImage2D. Both has 2 different signatures, one looks same as readPixel. I think it worth to test if it is faster.Glossy
B
9

Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?

Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.

Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.

Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?

This will get you immensely better performance.

If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).

The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)

Bangs answered 3/2, 2015 at 4:18 Comment(1)
Alright, I experimented with merging everything into a single large texture so there's only one read. It looks like it's good enough. I made a hacky "overlay texture X over texture Y" shader, did 32 operations on 10 qubits (so 32 shaders with 64x64 outputs), read the merged output, and it benchmarked at ~50Hz. That might not be enough headroom, but it's 100x better and should be easier to integrate with the existing stuff than using shaders end to end.Bebel
F
1

The way to make readPixels 10x faster(or even 20x in my case) is by using asynchronous reading into a pixel-pack buffer, waiting for the response and then copying the data with getBufferSubData

Usage


    // Bind your framebuffer
    gl.bindFramebuffer(gl.READ_FRAMEBUFFER, fbo)
    
    // Provide output target
    const data = new Uint8Array(width * height * 4)

    await readPixelsAsync(gl, width, height, data)

Needed functions

    const clientWaitAsync = function (gl, sync, flags = 0, interval_ms = 10) {
        return new Promise(function (resolve, reject) {
            var check = function () {
                var res = gl.clientWaitSync(sync, flags, 0);
                if (res == gl.WAIT_FAILED) {
                    reject();
                    return;
                }
                if (res == gl.TIMEOUT_EXPIRED) {
                    setTimeout(check, interval_ms);
                    return;
                }
                resolve();
            };
            check();
        });
    };
    
    const readPixelsAsync = function (gl, width, height, buffer) {
        const bufpak = gl.createBuffer();
        gl.bindBuffer(gl.PIXEL_PACK_BUFFER, bufpak);
        gl.bufferData(gl.PIXEL_PACK_BUFFER, buffer.byteLength, gl.STREAM_READ);
        gl.readPixels(0, 0, width, height, gl.RGBA, gl.UNSIGNED_BYTE, 0);
        var sync = gl.fenceSync(gl.SYNC_GPU_COMMANDS_COMPLETE, 0);
        if (!sync) return null;
        gl.flush();
        return clientWaitAsync(engine, sync, 0, 10).then(function () {
            gl.deleteSync(sync);
            gl.bindBuffer(gl.PIXEL_PACK_BUFFER, bufpak);
            gl.getBufferSubData(gl.PIXEL_PACK_BUFFER, 0, buffer);
            gl.bindBuffer(gl.PIXEL_PACK_BUFFER, null);
            gl.deleteBuffer(bufpak);
            return buffer;
        });
    };

Fluoride answered 25/9, 2023 at 21:3 Comment(0)
D
0

I don't know enough about your use case but just guessing, Why do you need to readPixels at all?

First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.

Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.

If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.

Am I making any sense?

Dividers answered 3/2, 2015 at 15:25 Comment(4)
Because I'm not used to doing it that way. A bad reason long term, but over the short term it means I can focus on learning one part instead of running into and solving problems like "How do I architect things so I can tell if they clicked on a box despite the coordinates being decided by the shader?" every hour.Bebel
Um, there's no reason to have coordinates decided by the shaderDividers
I'm trying to run a physics simulation by rendering to multiple framebuffer textures, then updating the results CPU side, then feeding it back in by swapping framebuffers. My FPS drops to 20-30 FPS when using readPixels() (after profiling it), so I'm wondering if WebGL2 somehow has better tricks on this I'm not aware of? Like perhaps something using copyTexImage2D() and readBuffer()? If drawArrays() is asynchronous, perhaps I need to read from the old framebuffer just after rendering to the new one instead?Symbol
Ok, it appears there is one other way that works (transfer to pixel pack buffers, then read the buffer) but it is worse (makes sense I guess). readPixels() cannot be used to quickly get results fast enough to maintain being called every frame for what I need. I'll need to switch to a multiple web worker idea instead and try that for the calculations. Still, for periodic gravitational force updates it may still be very useful. ;) Perhaps if canvas in WebWorkers (planned I think) is a reality something else can be done.Symbol

© 2022 - 2024 — McMap. All rights reserved.