It is most definitely worthwhile learning OpenGL ES2.0 shaders:
- You can load-balance between the GPU and CPU (e.g. video decoding of subsequent frames while GPU renders the current frame).
- Video frames need to go to the GPU in any case: using
YCbCr
saves you 25% bus bandwidth if your video has 4:2:0 sampled chrominance.
- You get 4:2:0 chrominance up-sampling for free, with the GPU hardware interpolator. (Your shader should be configured to use the same vertex coordinates for both
Y
and C{b,r}
textures, in effect stretching the chrominance texture out over the same area.)
- On iOS5 pushing
YCbCr
textures to the GPU is fast (no data-copy or swizzling) with the texture cache (see the CVOpenGLESTextureCache*
API functions). You will save 1-2 data-copies compared to NEON.
I am using these techniques to great effect in my super-fast iPhone camera app, SnappyCam.
You are on the right track for implementation: use a GL_LUMINANCE
texture for Y
and GL_LUMINANCE_ALPHA
if your CbCr
is interleaved. Otherwise use three GL_LUMINANCE
textures if all of your YCbCr
components are noninterleaved.
Creating two textures for 4:2:0 bi-planar YCbCr
(where CbCr
is interleaved) is straightforward:
glBindTexture(GL_TEXTURE_2D, texture_y);
glTexImage2D(
GL_TEXTURE_2D,
0,
GL_LUMINANCE, // Texture format (8bit)
width,
height,
0, // No border
GL_LUMINANCE, // Source format (8bit)
GL_UNSIGNED_BYTE, // Source data format
NULL
);
glBindTexture(GL_TEXTURE_2D, texture_cbcr);
glTexImage2D(
GL_TEXTURE_2D,
0,
GL_LUMINANCE_ALPHA, // Texture format (16-bit)
width / 2,
height / 2,
0, // No border
GL_LUMINANCE_ALPHA, // Source format (16-bits)
GL_UNSIGNED_BYTE, // Source data format
NULL
);
where you would then use glTexSubImage2D()
or the iOS5 texture cache to update these textures.
I'd also recommend using a 2D varying
that spans the texture coordinate space (x: [0,1], y: [0,1])
so that you avoid any dependent texture reads in your fragment shader. The end result is super-fast and doesn't load the GPU at all in my experience.