I know it's an relatively old question, but I think I managed to implement the solution you're describing. To summarize, the idea was to provide a dash manifest to the client, but only convert the segments when the client was asking for them.
The steps to achieve that were:
- Convert a 10-second section of one stream of the original file using ffmpeg (or extracting it if it was in x264 already)
- Repackaging it using MP4Box for MSE to consume it on the client side.
The command for step 1 would look like this (for the 3rd segment of stream 0):
ffmpeg -y -ss 30 -t 11 -threads 8 -copyts -start_at_zero -i "/path/to/original.mp4" -map 0:1 -c copy /tmp/output_segment.mp4
"-ss 30" tells ffmpeg to start 30 seconds after the start of the file. "-t 11" keeps 11 seconds of the track after that (the overlap avoids gaps in the playback). "-copyts" keeps the timestamps as they are, so the extracted segmented would start at 30s, not 0. "-c copy" copies the original stream and would be replaced by something like "-g 30 -c:v libx264 -crf 22 -profile:v high -level 3.1" if it had to be transcoded.
The second command to repackage the workstream is:
MP4Box -dash 10000 -frag 500 -rap -single-file -segment-name segment_base_name_ -tfdt $TFDT_OFFSET /tmp/output_segment.mp4 -out /tmp/unused_ouput.mp4
The ouput can be discarded, but it also creates a file named segment_base_name_init.mp4 that is then the actual segment you need. The -tfdt argument here is the most important as is offsets the segment properly in the timeline. To get the right value, I use the following command (because keyframes are not exactly at the 10s marks, the start of the segment may not be where we expect it to be):
ffprobe -print_format json -show_streams /tmp/output_segment.mp4
The right value is start_time * 1000 (-tfdt uses milliseconds)
I hope this helps, it took me a while to make it work and I stumbled upon this question since MP4Box has suddenly stopped working since the last update. Also note you can achieve that also with VP9 and Vorbis, you then don't need to repack the streams.
EDIT
For anyone who would be interested in this, there are some issues with the method I described above since MP4Box doesn't properly update the tfdt records since version 1.0 (?).
When creating a segment independently of the others, the segment has to be compliant with the Dash standard (which MP4Box did in the previous solution but FFMpeg can do it too using -f dash for the output). Options also have to ensure that boudaries of the segments are aligned with RAP (or SAP or i-frames, I think). The command looks like this:
ffmpeg -y -ss 390 -to 400 -threads 6 -copyts -start_at_zero -noaccurate_seek -i input.mkv -map 0:1 -c copy -movflags frag_keyframe -single_file_name segment_39.mp4 -global_sidx 1 -min_frag_duration 500 -f dash unused.mpd
Then the problem is to ensure that each segment will be properly placed in the timeline by MSE. In a fragmented MP4 file, there are three locations that influence the position in the timeline:
- in the moov box (general information on the video), the else box (in trak, edts) will have a list of edits. FFMpeg, when using -ss with -copyts, will create an empty edit before the video itself with the duration of -ss (in ms)
- in the sidx box (index allowing to locate segments), the earliest_presentation_time field also defines an offset in track timebase
- in each moof boxes (the header for a fragment), the tfdt box in traf has a base_media_decode_time field, placing each fragment on the timeline, also in track timebase
The problem with FFMpeg is that it will properly create the first two, but tfdt times start from zero. Since I failed to find a way to do this, I've written those simple functions to correct that. Note that it removes the first edit since it's recognized by Firefox, but not by Chrome, so videos are then compatible with both.
async function adjustSegmentTimestamps() {
// console.log('Closing FFMPEG data (code should be 0)', code, signal);
const file = await open(this.filename, 'r');
const buffer = await readFile(file);
await file.close();
this.outFile = await open(this.filename, 'w', 0o666);
// Clear first entry in edit list (required for Firefox)
const moovOffset = this.seekBoxStart(buffer, 0, buffer.length, 'moov');
if (moovOffset == -1) {
throw new Error('Cannot find moov box');
}
const moovSize = buffer.readUInt32BE(moovOffset);
const trakOffset = this.seekBoxStart(buffer, moovOffset + 8, moovSize - 8, 'trak');
if (trakOffset == -1) {
throw new Error('Cannot find trak box');
}
const trakSize = buffer.readUInt32BE(trakOffset);
const edtsOffset = this.seekBoxStart(buffer, trakOffset + 8, trakSize - 8, 'edts');
if (edtsOffset == -1) {
throw new Error('Cannot find edts box');
}
const edtsSize = buffer.readUInt32BE(edtsOffset);
const elstOffset = this.seekBoxStart(buffer, edtsOffset + 8, edtsSize - 8, 'elst');
if (elstOffset == -1) {
throw new Error('Cannot find elst box');
}
const numEntries = buffer.readUInt32BE(elstOffset + 12);
console.log('Elst entries', numEntries);
if (numEntries === 2) {
console.log('Setting 1st elst entry to 0 duration vs. ', buffer.readUInt32BE(elstOffset + 16));
buffer.writeUInt32BE(0, elstOffset + 16);
}
// Looking for sidx to find offset
let sidxOffset = this.seekBoxStart(buffer, 0, buffer.length, 'sidx');
if (sidxOffset == -1) {
throw new Error('Cannot find sidx box');
}
sidxOffset += 8;
const sidxVersion = buffer.readUInt8(sidxOffset);
let earliest_presentation_time;
if (sidxVersion) {
earliest_presentation_time = buffer.readBigUInt64BE(sidxOffset + 12);
// buffer.writeBigInt64BE(BigInt(0), sidxOffset + 12);
} else {
earliest_presentation_time = buffer.readUInt32BE(sidxOffset + 12);
// buffer.writeUInt32BE(0, sidxOffset + 12);
}
console.log('Found sidx at ', sidxOffset, earliest_presentation_time);
// Adjust tfdt in each moof
let moofOffset = 0;
while (moofOffset < buffer.length) {
console.log();
moofOffset = this.seekBoxStart(buffer, moofOffset, buffer.length - moofOffset, 'moof');
if (moofOffset == -1) {
console.log('No more moofs');
break;
}
const moofSize = buffer.readUInt32BE(moofOffset);
if (moofOffset == -1) {
console.log('Finished with moofs');
break;
}
console.log('Next moof at ', moofOffset);
const trafOffset = this.seekBoxStart(buffer, moofOffset + 8, moofSize - 8, 'traf');
const trafSize = buffer.readUInt32BE(trafOffset);
console.log('Traf offset found at', trafOffset);
if (trafOffset == -1) {
throw new Error('Traf not found');
}
const tfdtOffset = this.seekBoxStart(buffer, trafOffset + 8, trafSize - 8, 'tfdt');
console.log('tfdt offset found at', tfdtOffset);
if (tfdtOffset == -1) {
throw new Error('Tfdt not found');
}
const tfdtVersion = buffer.readUInt8(tfdtOffset + 8);
let currentBaseMediaDecodeTime;
if (tfdtVersion) {
currentBaseMediaDecodeTime = buffer.readBigUInt64BE(tfdtOffset + 12);
buffer.writeBigInt64BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
} else {
currentBaseMediaDecodeTime = buffer.readUInt32BE(tfdtOffset + 12);
buffer.writeUInt32BE(currentBaseMediaDecodeTime + earliest_presentation_time, tfdtOffset + 12);
}
console.log('TFDT offset', currentBaseMediaDecodeTime);
moofOffset += moofSize;
}
await this.outFile.write(buffer);
await this.outFile.close();
}
async function seekBoxStart(buffer: Buffer, start: number, size: number, box: string): number {
let offset = start;
while (offset - start < size) {
const size_ = buffer.readUInt32BE(offset);
const type_ = buffer.toString('ascii', offset + 4, offset + 8);
console.log('Found box:', type_);
if (type_ === box) {
console.log('Found box at ', box, offset);
return offset;
}
offset += size_;
}
return -1;
}