I need to do a CPU-side read-only process on live camera data (from just the Y plane) followed by rendering it on the GPU. Frames shouldn't be rendered until processing completes (so I don't always want to render the latest frame from the camera, just the latest one that the CPU-side has finished processing). Rendering is decoupled from the camera processing and aims for 60 FPS even if the camera frames arrive at a lower rate than that.
There's a related but higher-level question over at: Lowest overhead camera to CPU to GPU approach on android
To describe the current setup in a bit more detail: we have an app-side buffer pool for camera data where buffers are either "free", "in display", or "pending display". When a new frame from the camera arrives we grab a free buffer, store the frame (or a reference to it if the actual data is in some system-provided buffer pool) in there, do the processing and stash the results in the buffer, then set the buffer "pending display". In the renderer thread if there is any buffer "pending display" at the start of the render loop we latch it to be the one "in display" instead, render the camera, and render the other content using the processed information calculated from the same camera frame.
Thanks to @fadden's response on the question linked above I now understand the "parallel output" feature of the android camera2 API shares the buffers between the various output queues, so shouldn't involve any copies on the data, at least on modern android.
In a comment there was a suggestion that I could latch the SurfaceTexture and ImageReader outputs at the same time and just "sit on the buffer" until the processing is complete. Unfortunately I don't think that's applicable in my case due to the decoupled rendering that we still want to drive at 60 FPS, and that will still need access to the previous frame whilst the new one is being processed to ensure things don't get out of sync.
One solution that has come to mind is having multiple SurfaceTextures - one in each of our app-side buffers (we currently use 3). With that scheme when we get a new camera frame, we would obtain a free buffer from our app-side pool. Then we'd call acquireLatestImage()
on an ImageReader to get the data for processing, and call updateTexImage()
on the SurfaceTexture in the free buffer. At render time we just need to make sure the SufaceTexture from the "in display" buffer is the one bound to GL, and everything should be in sync most of the time (as @fadden commented there is a race between calling the updateTexImage()
and acquireLatestImage()
but that time window should be small enough to make it rare, and is perhaps dectable and fixable anyway using the timestamps in the buffers).
I note in the docs that updateTexImage()
can only be called when the SurfaceTexture is bound to a GL context, which suggests I'll need a GL context in the camera processing thread too so the camera thread can do updateTexImage()
on the SurfaceTexture in the "free" buffer whilst the render thread is still able to render from the SurfaceTexture from the "in display" buffer.
So, to the questions:
- Does this seem like a sensible approach?
- Are SurfaceTextures basically a light wrapper around the shared buffer pool, or do they consume some limited hardware resource and should be used sparingly?
- Are the SurfaceTexture calls all cheap enough that using multiple ones will still be a big win over just copying the data?
- Is the plan to have two threads with distinct GL contexts with a different SurfaceTexture bound in each likely to work or am I asking for a world of pain and buggy drivers?
It sounds promising enough that I'm going to give it a go; but thought it worth asking here in case anyone (basically @fadden!) knows of any internal details that I've overlooked which would make this a bad idea.