Lowest overhead camera to CPU to GPU approach on android
Asked Answered
C

1

7

My application needs to do some processing on live camera frames on the CPU, before rendering them on the GPU. There's also some other stuff being rendered on the GPU which is dependent on the results of the CPU processing; therefore it's important to keep everything synchronised so we don't render the frame itself on the GPU until the results of the CPU processing for that frame are also available.

The question is what's the lowest overhead approach for this on android?

The CPU processing in my case just needs a greyscale image, so a YUV format where the Y plane is packed is ideal (and tends to be a good match to the native format of the camera devices too). NV12, NV21 or fully planar YUV would all provide ideal low-overhead access to greyscale, so that would be preferred on the CPU side.

In the original camera API the setPreviewCallbackWithBuffer() was the only sensible way to get data onto the CPU for processing. This had the Y plane separate so was ideal for the CPU processing. Getting this frame available to OpenGL for rendering in a low overhead way was the more challenging aspect. In the end I wrote a NEON color conversion routine to output RGB565 and just use glTexSubImage2d to get this available on the GPU. This was first implemented in the Nexus 1 timeframe, where even a 320x240 glTexSubImage2d call took 50ms of CPU time (poor drivers trying to do texture swizzling I presume - this was significantly improved in a system update later on).

Back in the day I looked into things like eglImage extensions, but they don't seem to be available or well documented enough for user apps. I had a little look into the internal android GraphicsBuffer classes but ideally want to stay in the world of supported public APIs.

The android.hardware.camera2 API had promise with being able to attach both an ImageReader and a SurfaceTexture to a capture session. Unfortunately I can't see any way of ensuring the right sequential pipeline here - holding off calling updateTexImage() until the CPU has processed is easy enough, but if another frame has arrived during that processing then updateTexImage() will skip straight to the latest frame. It also seems with multiple outputs there will be independent copies of the frames in each of the queues that ideally I'd like to avoid.

Ideally this is what I'd like:

  1. Camera driver fills some memory with the latest frame
  2. CPU obtains pointer to the data in memory, can read Y data without a copy being made
  3. CPU processes data and sets a flag in my code when frame is ready
  4. When beginning to render a frame, check if a new frame is ready
  5. Call some API to bind the same memory as a GL texture
  6. When a newer frame is ready, release the buffer holding the previous frame back into the pool

I can't see a way of doing exactly that zero-copy style with public API on android, but what's the closest that it's possible to get?

One crazy thing I tried that seems to work, but is not documented: The ANativeWindow NDK API can accept data NV12 format, even though the appropriate format constant is not one of the ones in the public headers. That allows a SurfaceTexture to be filled with NV12 data by memcpy() to avoid CPU-side colour conversion and any swizzling that happens driver side in glTexImage2d. That is still an extra copy of the data though that feels like it should be unnecessary, and again as it's undocumented might not work on all devices. A supported sequential zero-copy Camera -> ImageReader -> SurfaceTexture or equivalent would be perfect.

Charmainecharmane answered 29/5, 2016 at 13:30 Comment(0)
L
5

The most efficient way to process video is to avoid the CPU altogether, but it sounds like that's not an option for you. The public APIs are generally geared toward doing everything in hardware, since that's what the framework itself needs, though there are some paths for RenderScript. (I'm assuming you've seen the Grafika filter demo that uses fragment shaders.)

Accessing the data on the CPU used to mean slow Camera APIs or working with GraphicBuffer and relatively obscure EGL functions (e.g. this question). The point of ImageReader was to provide zero-copy access to YUV data from the camera.

You can't really serialize Camera -> ImageReader -> SurfaceTexture as ImageReader doesn't have a "forward the buffer" API. Which is unfortunate, as that would make this trivial. You could try to replicate what SurfaceTexture does, using EGL functions to package the buffer as an external texture, but again you're into non-public GraphicBuffer-land, and I worry about ownership/lifetime issues of the buffer.

I'm not sure how the parallel paths help you (Camera2 -> ImageReader, Camera2 -> SurfaceTexture), as what's being sent to the SurfaceTexture wouldn't have your modifications. FWIW, it doesn't involve an extra copy -- in Lollipop or thereabouts, BufferQueue was updated to allow individual buffers to move through multiple queues.

It's entirely possible there's some fancy new APIs I haven't seen yet, but from what I know your ANativeWindow approach is probably the winner. I suspect you'd be better off with one of the Camera formats (YV12 or NV21) than NV12, but I don't know for sure.

FWIW, you will drop frames if your processing takes too long, but unless your processing is uneven (some frames take much longer than others) you'll have to drop frames no matter what. Getting into the realm of non-public APIs again, you could switch the SurfaceTexture to "synchronous" mode, but if your buffers fill up you're still dropping frames.

Labialize answered 29/5, 2016 at 18:17 Comment(7)
Excellent answer, thanks. The CPU side isn't actually writing to the buffer at all, it's basically an AR app that needs to calculate object motion in the frame on the CPU to correctly render the content overlaid in the right position on the GPU. So the parallel paths would be fine but I think the part that's missing is getting the right buffer in the SurfaceTexture; the "latest the CPU has processed" rather than "the latest in the queue".Charmainecharmane
I looked into the RenderScript docs a little yesterday. I doubt it will work, but if it's possible to have an Allocation that is USAGE_IO_INPUT | USAGE_IO_OUTPUT | USAGE_SCRIPT then that may allow it to be both the destination for the camera and the source for the SurfaceTexture, with the CPU in control of the timing of the buffer moving through the pipe. The RenderScript NDK headers have a getPointer() function on the Allocation so should be able to process in C++ NDK code (unfortunately the headers are not in the latest NDK release, so not sure what the issue is there).Charmainecharmane
I don't have a good mental picture of your parallel pipeline, but it seems like you'd want to call updateTexImage() and acquireLatestImage() at the same time, and then just sit on the latched buffer while the CPU does its work. It's not race-free, as the a new camera frame could land between the two calls, but that should be rare, and can be mitigated by keeping an eye on the clock.Labialize
USAGE_IO_INPUT and USAGE_IO_OUTPUT cannot be used together in a single RenderScript Allocation. I think you have to use two Allocations here: Camera2 -> Allocation -> (processing) -> Allocation -> SurfaceTexture. Have you tried something like github.com/googlesamples/android-HdrViewfinderBarnacle
There's a follow-up question here outlining and approach that might work: #37593434Charmainecharmane
@MiaoWang thanks for the note on input/output not being possible and the suggestion. I have seen that sample but am ideally looking for a completely zero-copy solution. I'm not really a fan of the Renderscript programming model (I like explicit control for performance-critical stuff), but thought there might be a route just using Allocations to get the zero-copy pipeline set up. Seems not, but the parallel approach might just work, as discussed in the follow-up question.Charmainecharmane
@Labialize Can you please assist me in this trouble? https://mcmap.net/q/1174814/-extract-raw-or-encoded-video-frames-from-any-gpu-memory-directly-for-current-program/10413749Intrigant

© 2022 - 2024 — McMap. All rights reserved.