Video formats are an incredibly complex subject.
Some video streams have the pixels stored in bytes RGBA, ARGB, ABGR, or several other variants (with or without an alpha channel)
(In RGBA format, you'd have the red, green, blue, and alpha values of a pixel one right after each other in memory, followed by another set of 4 bytes with the color values of the next pixel, etc.) This is interlaced color information.
Some video streams separate out the color channels so all the red channel, blue, green, and alpha are sent as separate "planes". You'd get a buffer with all the red information, then all the blue data, then all the green, and then alpha, if alpha is included. (Think of color negatives, where there are separate layers of emulsion to capture the different colors. The layers of emulsion are planes of color information. It's the same idea with digital.)
There are formats where the color data is in one or 2 planes, and then the luminance is in a separate plane. That's how old analog color TV works. It started out as black and white (luminance) and then broadcasters added side-band signals to convey the color information. (Chroma)
I don't muck around with CVPixelBuffer
s often enough to know the gory details of what you are asking, and have to invest large amounts of time and copious amounts of coffee before I can "spin up" my brain enough to grasp those gory details.
Edit:
Since your debug information shows 2 planes, it seems likely that this pixel buffer has a luminance channel and a chroma channel, as mentioned in @zeh's answer.