NVIDIA NVENC (Media Foundation) encoded h.264 frames not decoded properly using VideoToolbox

I am facing the same problem as described here when trying to decode a frame on iPad Pro OS v14.3 (I am also using Olivia Stork's example):

25% of the picture data is decoded correctly, the rest of the picture is just green.

The decoded image on iPad Pro OS v14.3 looks like this (the image was converted and saved in the decoder callback as described here, so it's not just a displaying problem).

The original image looks like this.

The image is encoded with NVIDIA NVENC (Media Foundation) on Windows10.

I searched the frame picture data for additional 4-Byte NALU start codes as described in the link, but there are only the three expected ones for SPS, PPS and IDR picture data.

I have another Media Foundation decoder application running on Windows10 which can decode the frames from exactly the same source correctly.

I am struggling for days now finding the cause of the problem.. anyone any ideas?

Thanks in advance. Rob

- EDIT 2021-01-11:

I figured out that there are actually three additional 3-byte start codes (0x000001) within in the IDR picture data block of NALU type 5.

I tried to replace these start codes with the length of the following data block (big endian), as described here, but with the same result.

I also tried adding Emulation Prevention Bytes (0x000001 => 0x000301) as described here, but that also made no difference.

Maybe I am mislead and these start codes have nothing to do with the issue.. at least they are not just random image data, because they always appear at the same position (index) in the picture data block. Currently I am running out of ideas.. any hint anybody?

- EDIT 2021-01-14:

I figured out a few more things:

Out of sheer lack of ideas I copied the picture data followed after the last start code at the beginning of the block (right after after the 4-Byte NALU start code). I had expected - if that would work at all - to see the last quarter of the original image at the top of the decoded image, but to my surprise the decoded image looked like this.

I tried the same with the picture data coming after the second and third start code, and the decoded image looked like this and this: The image data is decoded correctly and it is even at the correct position (compare to original image).

Even if I strip off all 3-Byte start codes and copy the picture data concatenated after the 4-Byte start code, it's the same result, only 25% of the image is decoded. So the additional 3-Byte start codes are apparently not the problem. There must be some setting somewhere which tells the decoder to only decode 25% of the image. I would tip on the CMVideoFormatDescription, but as far as I can see it looks okay.

I am also wondering how the decoder knows where to display the different picture data blocks. Either there is an offset defined somewhere within the picture data or the xy-position of every pixel is added by the encoder somehow..

I managed to find the cause of the problem: The 3-Byte start codes in the IDE picture data block must be replaced by 4-Byte start codes.

So first replace all 3-Byte start codes by 4-Byte start codes. Then replace the 4-Byte start codes with the length of the following data block (big endian). The slices should be arranged like this (as mentioned here by 'Blackie'):

[4byte slice1 size][slice1 data][4byte slice2 size][slice2 data]...[4byte slice4 size][slice4 data]

Remember to not include the start code length in slice size.

After changing that, my frame was completely displayed.

By the way: The information where to display the different picture data blocks is specified in the header data of each NALU (parameter 'first_mb_in_slice').

There is a very good c# example here how to extract the NALU header data. You can almost copy it 1:1.

Recommended topics

Hot tags