I am totally understand about the size of the NV12 format as described in question
Now I am reading from two sources about the storage of UV plane in this format: one is https://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx
NV12
All of the Y samples appear first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:0 video. The following illustration shows the Y plane and the array that contains packed U and V samples.
What I understand is: in UV plane each U and V are stored in single byte.
When I read from wikipedia about this: https://wiki.videolan.org/YUV#NV12
It says:
NV12
Related to I420, NV12 has one luma "luminance" plane Y and one plane with U and V values interleaved. In NV12, chroma planes (blue and red) are subsampled in both the horizontal and vertical dimensions by a factor of 2. For a 2x2 group of pixels, you have 4 Y samples and 1 U and 1 V sample. It can be helpful to think of NV12 as I420 with the U and V planes interleaved. Here is a graphical representation of NV12. Each letter represents one bit: For 1 NV12 pixel: YYYYYYYY UVUV For a 2-pixel NV12 frame: YYYYYYYYYYYYYYYY UVUVUVUV For a 50-pixel NV12 frame: Y*8*50 (UV)*2*50 For a n-pixel NV12 frame: Y*8*n (UV)*2*n
What I understand here is : each U and V are interleaved bit by bit in each byte. So each each byte of UV plane will contain 4U bits and 4V bits interleaved.
Can anyone clarify my doubt?