Generate single MPEG-Dash segment with ffmpeg [closed]

Asked 4/7, 2019 at 15:47 Answered 9/4, 2021 at 9:23

I've been trying to implement a Plex-like video player that transcodes an arbitrary video file on-demand, and plays it with MPEG-Dash on a webpage. I was able to implement the client side player with the dash.js reference implementation, so it will dynamically request segments from the server (using SegmentTemplate in the mpd file).

But I'm having some problems generating these chunks in real-time. Ffmpeg lets me set -ss and -t to define the boundaries of the segment I need, but they don't play properly in the player because they're "full" video files rather than Dash segments.

So how do I adjust my ffmpeg command to transcode just the part I need as a Dash segment, without having to generate the segments for the entire video file in advance?

The input video file can be any format, so it cannot be assumed it's in an mp4/dash-compatible codec. So transcoding (with ffmpeg or similar tool) is required.

My current ffmpeg command looks like this (after lots of trying):

ffmpeg -ss 10 -t 5 -i video.mkv -f mp4 -c:a aac -c:v h264 -copyts -movflags empty_moov+frag_keyframe temp/segment.mp4

The client-side player should be able to buffer the next X segments, and the user should be able to view the current position on the duration bar and seek to a different position. So treating it as a live stream isn't an option.

Territorial answered 4/7, 2019 at 15:47 Comment(2)

Did you solve this issue? – Atween 6/10, 2019 at 19:36

I don't fully understand the source so I can't really transcribe a proper answer, but the Plex transcoder is just ffmpeg with their patches. They are required to provide them, you can find copies of it online, or their own copy, and diff it with the ffmpeg source. I'm always hoping someone would upstream the changes so we could create this exact functionality you're asking about. – Kufic 27/10, 2019 at 8:59

I know it's an relatively old question, but I think I managed to implement the solution you're describing. To summarize, the idea was to provide a dash manifest to the client, but only convert the segments when the client was asking for them.

The steps to achieve that were:

Convert a 10-second section of one stream of the original file using ffmpeg (or extracting it if it was in x264 already)
Repackaging it using MP4Box for MSE to consume it on the client side.

The command for step 1 would look like this (for the 3rd segment of stream 0):

ffmpeg -y -ss 30 -t 11 -threads 8 -copyts -start_at_zero -i "/path/to/original.mp4" -map 0:1 -c copy /tmp/output_segment.mp4

"-ss 30" tells ffmpeg to start 30 seconds after the start of the file. "-t 11" keeps 11 seconds of the track after that (the overlap avoids gaps in the playback). "-copyts" keeps the timestamps as they are, so the extracted segmented would start at 30s, not 0. "-c copy" copies the original stream and would be replaced by something like "-g 30 -c:v libx264 -crf 22 -profile:v high -level 3.1" if it had to be transcoded.

The second command to repackage the workstream is:

MP4Box -dash 10000 -frag 500 -rap -single-file -segment-name segment_base_name_ -tfdt $TFDT_OFFSET /tmp/output_segment.mp4 -out /tmp/unused_ouput.mp4

The ouput can be discarded, but it also creates a file named segment_base_name_init.mp4 that is then the actual segment you need. The -tfdt argument here is the most important as is offsets the segment properly in the timeline. To get the right value, I use the following command (because keyframes are not exactly at the 10s marks, the start of the segment may not be where we expect it to be):

ffprobe -print_format json -show_streams /tmp/output_segment.mp4

The right value is start_time * 1000 (-tfdt uses milliseconds)

I hope this helps, it took me a while to make it work and I stumbled upon this question since MP4Box has suddenly stopped working since the last update. Also note you can achieve that also with VP9 and Vorbis, you then don't need to repack the streams.

EDIT

For anyone who would be interested in this, there are some issues with the method I described above since MP4Box doesn't properly update the tfdt records since version 1.0 (?).

When creating a segment independently of the others, the segment has to be compliant with the Dash standard (which MP4Box did in the previous solution but FFMpeg can do it too using -f dash for the output). Options also have to ensure that boudaries of the segments are aligned with RAP (or SAP or i-frames, I think). The command looks like this:

ffmpeg  -y -ss 390 -to  400 -threads 6 -copyts -start_at_zero -noaccurate_seek -i input.mkv -map 0:1 -c copy -movflags frag_keyframe -single_file_name segment_39.mp4 -global_sidx 1 -min_frag_duration 500 -f dash unused.mpd

Then the problem is to ensure that each segment will be properly placed in the timeline by MSE. In a fragmented MP4 file, there are three locations that influence the position in the timeline:

in the moov box (general information on the video), the else box (in trak, edts) will have a list of edits. FFMpeg, when using -ss with -copyts, will create an empty edit before the video itself with the duration of -ss (in ms)
in the sidx box (index allowing to locate segments), the earliest_presentation_time field also defines an offset in track timebase
in each moof boxes (the header for a fragment), the tfdt box in traf has a base_media_decode_time field, placing each fragment on the timeline, also in track timebase

The problem with FFMpeg is that it will properly create the first two, but tfdt times start from zero. Since I failed to find a way to do this, I've written those simple functions to correct that. Note that it removes the first edit since it's recognized by Firefox, but not by Chrome, so videos are then compatible with both.

    async function adjustSegmentTimestamps() {
        // console.log('Closing FFMPEG data (code should be 0)', code, signal);
        const file = await open(this.filename, 'r');
        const buffer = await readFile(file);
        await file.close();

        this.outFile = await open(this.filename, 'w', 0o666);

        // Clear first entry in edit list (required for Firefox)
        const moovOffset = this.seekBoxStart(buffer, 0, buffer.length, 'moov');
        if (moovOffset == -1) {
            throw new Error('Cannot find moov box');
        }
        const moovSize = buffer.readUInt32BE(moovOffset);
        const trakOffset = this.seekBoxStart(buffer, moovOffset + 8, moovSize - 8, 'trak');
        if (trakOffset == -1) {
            throw new Error('Cannot find trak box');
        }
        const trakSize = buffer.readUInt32BE(trakOffset);
        const edtsOffset = this.seekBoxStart(buffer, trakOffset + 8, trakSize - 8, 'edts');
        if (edtsOffset == -1) {
            throw new Error('Cannot find edts box');
        }
        const edtsSize = buffer.readUInt32BE(edtsOffset);
        const elstOffset = this.seekBoxStart(buffer, edtsOffset + 8, edtsSize - 8, 'elst');
        if (elstOffset == -1) {
            throw new Error('Cannot find elst box');
        }
        const numEntries = buffer.readUInt32BE(elstOffset + 12);
        console.log('Elst entries', numEntries);
        if (numEntries === 2) {
            console.log('Setting 1st elst entry to 0 duration vs. ', buffer.readUInt32BE(elstOffset + 16));
            buffer.writeUInt32BE(0, elstOffset + 16);
        }

        // Looking for sidx to find offset
        let sidxOffset = this.seekBoxStart(buffer, 0, buffer.length, 'sidx');
        if (sidxOffset == -1) {
            throw new Error('Cannot find sidx box');
        }
        sidxOffset += 8;

        const sidxVersion = buffer.readUInt8(sidxOffset);
        let earliest_presentation_time;
        if (sidxVersion) {
            earliest_presentation_time = buffer.readBigUInt64BE(sidxOffset + 12);
            // buffer.writeBigInt64BE(BigInt(0), sidxOffset + 12);
        } else {
            earliest_presentation_time = buffer.readUInt32BE(sidxOffset + 12);
            // buffer.writeUInt32BE(0, sidxOffset + 12);
        }

        console.log('Found sidx at ', sidxOffset, earliest_presentation_time);

        // Adjust tfdt in each moof
        let moofOffset = 0;
        while (moofOffset < buffer.length) {
            console.log();
            moofOffset = this.seekBoxStart(buffer, moofOffset, buffer.length - moofOffset, 'moof');
            if (moofOffset == -1) {
                console.log('No more moofs');
                break;
            }
            const moofSize = buffer.readUInt32BE(moofOffset);

            if (moofOffset == -1) {
                console.log('Finished with moofs');
                break;
            }
            console.log('Next moof at ', moofOffset);

            const trafOffset = this.seekBoxStart(buffer, moofOffset + 8, moofSize - 8, 'traf');
            const trafSize = buffer.readUInt32BE(trafOffset);
            console.log('Traf offset found at', trafOffset);
            if (trafOffset == -1) {
                throw new Error('Traf not found');
            }

            const tfdtOffset = this.seekBoxStart(buffer, trafOffset + 8, trafSize - 8, 'tfdt');
            console.log('tfdt offset found at', tfdtOffset);
            if (tfdtOffset == -1) {
                throw new Error('Tfdt not found');
            }

            const tfdtVersion = buffer.readUInt8(tfdtOffset + 8);
            let currentBaseMediaDecodeTime;
            if (tfdtVersion) {
                currentBaseMediaDecodeTime = buffer.readBigUInt64BE(tfdtOffset + 12);
                buffer.writeBigInt64BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
            } else {
                currentBaseMediaDecodeTime = buffer.readUInt32BE(tfdtOffset + 12);
                buffer.writeUInt32BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
            }
            console.log('TFDT offset', currentBaseMediaDecodeTime);


            moofOffset += moofSize;
        }

        await this.outFile.write(buffer);
        await this.outFile.close();
    }

    async function seekBoxStart(buffer: Buffer, start: number, size: number, box: string): number {
        let offset = start;
        while (offset - start < size) {
            const size_ = buffer.readUInt32BE(offset);
            const type_ = buffer.toString('ascii', offset + 4, offset + 8);

            console.log('Found box:', type_);
            if (type_ === box) {
                console.log('Found box at ', box, offset);
                return offset;
            }

            offset += size_;
        }

        return -1;
    }

Strauss answered 9/4, 2021 at 9:23 Comment(1)

I really hope you are still active here. I have exactly the same usecase. I want to only generate the segment when its requested. I have a static manifest. I am able to generate the init segment and other following segments but I see either behavior A which is it only plays 10 seconds (the first segment), or behavior B, when specifying a duration in the manifest of 10 seconds, it rapidly requests subsequent fragments as if the fragment is only a second long which is spamming the servers. This also glitches the fragments together after a second (so one segment per second almost). – Nocturne 14/9 at 10:22

It sounds like what you are describing is live streaming rather than VOD - live streams are continuous, usually real time video streams and VOD is typically a video file which is served when the user requests it.

The usual way VOD is done in larger solutions is to segment the video first and then to package it on demand into the required streaming protocol, usually HLS or DASH at this time. This allows an operator minimise the different formats they need to maintain.

The emerging CMAF standard helps support this by using the same format for the segments for both HLS and DASH. If you search for 'CMAF' you will see many explanations of the history and the official page is here also: https://www.iso.org/standard/71975.html

Open source tools exist to help you convert an MP4 file straight into DASH - MP4Box is one of the most common ones: https://github.com/gpac/gpac/wiki/DASH-Support-in-MP4Box

ffmpeg also includes information in the documentation to support VOD: https://www.ffmpeg.org/ffmpeg-formats.html#dash-2 including an example:

ffmpeg -re -i <input> -map 0 -map 0 -c:a libfdk_aac -c:v libx264 \
-b:v:0 800k -b:v:1 300k -s:v:1 320x170 -profile:v:1 baseline \
-profile:v:0 main -bf 1 -keyint_min 120 -g 120 -sc_threshold 0 \
-b_strategy 0 -ar:a:1 22050 -use_timeline 1 -use_template 1 \
-window_size 5 -adaptation_sets "id=0,streams=v id=1,streams=a" \
-f dash /path/to/out.mpd

If it is actually a live stream you are looking at then the input is typically not an MP4 file but a stream in some format like HLS, RTMP, MPEG-TS etc.

Taking an input in this format and providing a live profile DASH output is more complicated. Generally a dedicated packager is used to do this. The open source Shaka Packager (https://github.com/google/shaka-player) would be a good place to look and it includes examples to produce DASH live output:

https://google.github.io/shaka-packager/html/tutorials/live.html

Assuming you want to allow the user watch while the video file is being generated then one way to do this is to make the stream look like a live stream, i.e. a 'VOD to Live' case.

You can use restreaming in Ffmpeg to transcode and stream to UDP and then feed that into a packager.

The ffmpeg documentation includes this note:

-re (input) Read input at native frame rate. Mainly used to simulate a grab device, or live input stream (e.g. when reading from a file). Should not be used with actual grab devices or live input streams (where it can cause packet loss). By default ffmpeg attempts to read the input(s) as fast as possible. This option will slow down the reading of the input(s) to the native frame rate of the input(s). It is useful for real-time output (e.g. live streaming).

This give you a flow that looks like:

mp4 file -> ffmpeg -> packager -> live DASH stream -> client

Using a packager to do this means you don't have to worry about updating the manifest when new segments are available or older ones not available.

There is an example here on the Wowza packager site (at the time of writing) which you could look at and experiment with, substituting you now files or using theirs - the output should work with any packager that can accept a UDP input stream: https://www.wowza.com/docs/how-to-restream-using-ffmpeg-with-wowza-streaming-engine

Canalize answered 5/7, 2019 at 11:17 Comment(5)

I'm doing VOD for sure, not live streaming. But a more "personal" variant, not like Netflix but rather like Plex, where the user provides their own media files with any format & codecs they like, to be transcoded and repackaged in real-time (rather than in advance), as the user is watching the video. So my question is not "how do I make a complete set of files for Dash" cause I've gotten that far, but I want to know how to transcode & package a single segment with given start & duration, for an arbitrary file. Mp4box and ffmpeg only do the entire Dash stream, as far as I can tell. – Territorial 5/7, 2019 at 11:26

Interesting and challenging use case! I've updated the answer with some more notes. – Canalize 5/7, 2019 at 13:0

Thanks for the ideas! Unfortunately I can't treat the video as a live stream, since the user must be able to see the current position and seek the player to a different position, within the native <video> element. This isn't possible with a live stream, I had the same idea and tried it, but a live stream doesn't allow for the client to buffer (to catch periodic network drops) and doesn't let the viewer seek to a different position in the video file. I've added this info to my original question as well. – Territorial 5/7, 2019 at 13:14

I'm doing something similar. @Canalize did you ever figure this out? – Underbody 27/10, 2020 at 1:2

@Diericx, it would probably make sense for you to ask a new question and include as much detail as you can around your use case. – Canalize 27/10, 2020 at 7:39

Recommended topics

Hot tags