Does a track run in a fragmented MP4 have to start with a key frame?
Asked Answered
L

1

6

I'm ingesting an RTMP stream and converting it to a fragmented MP4 file in JavaScript. It took a week of work but I'm almost finished with this task. I'm generating a valid ftyp atom, moov atom, and moof atom and the first frame of the video actually plays (with audio) before it goes into an infinite buffering with no errors listed in chrome://media-internals

Plugging the video into ffprobe, I get an error similar to:

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x558559198080] Failed to add index entry
    Last message repeated 368 times
[h264 @ 0x55855919b300] Invalid NAL unit size (-619501801 > 966).
[h264 @ 0x55855919b300] Error splitting the input into NAL units.

This led me on a massive hunt for data alignment issues or invalid byte offsets in my tfhd and trun atoms, however no matter where I looked or how I sliced the data, I couldn't find any problems in the moof atom.

I then took the original FLV file and converted it to an MP4 in ffmpeg with the following command:

ffmpeg -i ~/Videos/rtmp/big_buck_bunny.flv -c copy -ss 5 -t 10 -movflags frag_keyframe+empty_moov+faststart test.mp4

I opened both the MP4 I was creating and the MP4 output by ffmpeg in an atom parsing file and compared the two:

Comparing MP4 files with MP4A

The first thing that jumped out at me was the ffmpeg-generated file has multiple video samples per moof. Specifically, every moof started with 1 key frame, then contained all difference frames until the next key frame (which was used as the start of the following moof atom)

Contrast this with how I'm generating my MP4. I create a moof atom every time an FLV VIDEODATA packet arrives. This means my moof may not contain a key frame (and usually doesn't)

Could this be why I'm having trouble? Or is there something else I'm missing?

The video files in question can be downloaded here:

Another issue I noticed was ffmpeg's prolific use of base_data_offset in the tfhd atom. However when I tried tracking the total number of bytes appended and setting the base_data_offset myself, I got an error in Chrome along the lines of: "MSE doesn't support base_data_offset". Per the ISO/IEC 14996-10 spec:

If not provided, the base-data-offset for the first track in the movie fragment is the position of the first byte of the enclosing Movie Fragment Box, and for second and subsequent track fragments, the default is the end of the data defined by the preceding fragment.

This wording leads me to believe that the data_offset in the first trun atom should be equal to the size of the moof atom and the data_offset in the second trun atom should be 0 (0 bytes from the end of the data defined by the preceding fragment). However when I tried this I got an error that the video data couldn't be parsed. What did lead to data that could be parsed was the length of the moof atom plus the total length of the first track (as if the base offset were the first byte of the enclosing moof box, same as the first track)

Lovejoy answered 14/12, 2018 at 18:26 Comment(1)
What is the software in the picture you named?Nonperformance
E
4

No, the moof does not need to start with a key frame. The file you are generating produces invalid NALUs size errors, Because it has invalid nal sizes. Every nal (in the mdat) must have the size prepended to it. Looking at your file, the first 4 bytes after the mdat is 0x21180C68 which is WAY too large to be a valid size.

Ejection answered 15/12, 2018 at 5:20 Comment(4)
I have the audio track first, and the video track second. 0x21180C68 is AAC data, not H.264 dataLovejoy
Going to mark this as correct since "No, the moof does not need to start with a key frame." accurately answers the question. But the following sentences are incorrect (my NAL sizes were valid, 0x21180C68 wasn't a NAL size since it was audio data and not video data). The reason my video was not playing was due to my traf having a 0 duration. The misleading error message was due to a bug in FFMPEG. I ended up reading the entire FFMPEG source code to figure that one out.Lovejoy
@Lovejoy Thank you for the clarification. Did you mean that you used flag default‐sample‐duration‐present in your Track Fragment Header Box and default_sample_duration was set to 0?Drizzle
@Drizzle I don’t have the original files anymore or remember precisely, but if I were to guess it was a combination of having a default sample duration of zero (as you suggested) and not overriding this default in the Track Fragment Run (trun) box (either via not setting the sample duration present bit or by setting the sample duration to zero)Lovejoy

© 2022 - 2024 — McMap. All rights reserved.