FFMPEG's xstack command results in out of sync sound, is it possible to mix the audio in a single encoding?
Asked Answered
P

1

10

I wrote a python script that generates a xstack complex filter command. The video inputs is a mixture of several formats described here:

I have 2 commands generated, one for the xstack filter, and one for the audio mixing.

Here is the stack command: (sorry the text doesn't wrap!)

'c:/ydl/ffmpeg.exe',
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-filter_complex',
'[0]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf0];[rsclbf0]fps=24[rscl0];[1]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf1];[rsclbf1]fps=24[rscl1];[2]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf2];[rsclbf2]fps=24[rscl2];[3]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf3];[rsclbf3]fps=24[rscl3];[4]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf4];[rsclbf4]fps=24[rscl4];[5]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf5];[rsclbf5]fps=24[rscl5];[6]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf6];[rsclbf6]fps=24[rscl6];[7]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf7];[rsclbf7]fps=24[rscl7];[8]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf8];[rsclbf8]fps=24[rscl8];[9]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf9];[rsclbf9]fps=24[rscl9];[10]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf10];[rsclbf10]fps=24[rscl10];[11]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf11];[rsclbf11]fps=24[rscl11];[12]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf12];[rsclbf12]fps=24[rscl12];[13]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf13];[rsclbf13]fps=24[rscl13];[14]scale=480:270:force_original_aspect_ratio=decrease,pad=480:270:(ow-iw)/2:(oh-ih)/2, setsar=1[rsclbf14];[rsclbf14]fps=24[rscl14];[rscl0][rscl1][rscl2][rscl3][rscl4]concat=n=5[cct0];[rscl5][rscl6][rscl7]concat=n=3[cct1];[rscl8][rscl9][rscl10]concat=n=3[cct2];[rscl11][rscl12][rscl13][rscl14]concat=n=4[cct3];[cct0][cct1][cct2][cct3]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0',
'output.mp4',

Here is the mix_audio command:

'c:/ydl/ffmpeg.exe',
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-i', 'inputX.mp4'
'-filter_complex',
'[0:a][1:a][2:a][3:a][4:a]concat=n=5:v=0:a=1[cct_a0];[5:a][6:a][7:a]concat=n=3:v=0:a=1[cct_a1];[8:a][9:a][10:a]concat=n=3:v=0:a=1[cct_a2];[11:a][12:a][13:a][14:a]concat=n=4:v=0:a=1[cct_a3];[cct_a0][cct_a1][cct_a2][cct_a3]amix=inputs=4[all_aud]',
'-map',
'15:v',
'-map',
'[all_aud]',
'-c:v',
'copy',
'output.mp4',

Of course those are sample commands, I actually use many more videos as input, this sample is shorter for the sake or readability.

Here are the videos I use, with relevant ffprobe data, in some HTML table:

ffprobe data

I'm getting this warning:

[swscaler @ 0000020bac5a19c0] Warning: data is not aligned! This can lead to a speed loss

I think this is unrelated to audio desyncing this unaligned data is about x264 resolutions being multiple of 16, but my filter takes this into account already.

There is a perceptible audio desyncing, which is the main problem I am having. FFMPEG doesn't seem to get other errors. Is it because I use 2 commands to mix the audio after? How could I proceed to to the xstack stage and the audio mixing in a single stage?

I'm a bit confused as how FFMPEG handles diverse framerates. I was told to reencode all the video inputs before performing the xstack stage, but I would create some disk overhead, so I'd rather do it in a single ffmpeg job it possible.

Parochial answered 18/11, 2021 at 13:39 Comment(0)
P
3

I'm a bit confused as how FFMPEG handles diverse framerates

It doesn't, which would cause a misalignment in your case. The vast majority of filters (any which deal with multiple sources and make use of frames, essentially), including the Concatenate filter require that be the sources have the same framerate.

For the concat filter to work, the inputs have to be of the same frame dimensions (e.g., 1920⨉1080 pixels) and should have the same framerate.

(emphasis added)

The documentation also adds:

Therefore, you may at least have to add a ​scale or ​scale2ref filter before concatenating videos. A handful of other attributes have to match as well, like the stream aspect ratio. Refer to the documentation of the filter for more info.

You should convert your sources to the same framerate first.

Prowel answered 16/12, 2021 at 3:45 Comment(1)
Couldn't I use a filter to "convert the sources" ? I'm reading my command, and apparently I set all inputs to fps=24, before concatenating them... I rescale them, and put a correct aspect ratio, and then I concatenate them.Parochial

© 2022 - 2024 — McMap. All rights reserved.