Batch mixing audio, given timestamps. Multiple offsets, only two sounds. How to do it efficiently?
Asked Answered
C

0

0

I have two stereo sounds, 1.wav and 2.wav, these sounds are less than 1 second long and list of timestamps (miliseconds from start of recording). Recording of pure video (recording.mp4) is several hours long and there are thousands (20 000 - 30 000) of timestamps per sounds.

I want to convert list of timestamps and sounds into one recording, merging it with video. The part of merging audio with video is easy with ffmpeg, so this is not part of the question.

The list of timestamps is tsv, for example:

1201\t1.wav
1501\t2.wav 1603\t1.wav
and so on, up to 50 000

I can convert it to anything, I am generating this file.

I have seen mixing sound with padding and mixing audio to existing video, but I have to batch process a lots of samples, running sox that many times is not feasible. Mere constructing input for ffmpeg or sox is a cumbersome task.

sox -M f2.wav f3.wav f1.wav out.wav delay 4 4 8 8 remix 1,3,5 2,4,6
(assuming stereo), or

sox -m f1.wav "|sox f2.wav -p pad 4" "|sox f3.wav -p pad 8" out.wav

Cool for three files. Not feasible for 50 000+. First one needs to read file multiple times (even if it is the same one) and remix channels. Second executes 50 000 sox invocations, also reading the same two files (1.wav, 2.wav) over and over.

I do not use any effects on sounds. There is no explicit support in sox to take one input and play it multiple times (echo / echos destroys the material). Also creating padding or delay takes a lot of time. FFMPEG also needs long query to make it happen.

Since muxing two files is easy, I have tried to record two sounds separately, but still it takes a lot of time to process.

Is there simpler / faster way?

Taking advice from fdcpp, since wav is PCM coded I also consider writing C program to parse it. I will update code, when I am done.
This extends question: is there way to encode offsets in wav format?

Crupper answered 2/8, 2021 at 16:45 Comment(10)
Have you considered scripting this?Mishmash
If the video is without audio I’d be tempted to generate the audio separately then add it to the video. It would likely be neater to achieve this in small script / program rather than doing it entirely with Sox / FFMPEG. You will at the very least need a shell script of some kind. That said, are you able to insert a sound at a single, hard coded time stamp?Alkaloid
My personal approach would be in a program: 1, generate silent audio in memory long enough to fill the video 2. Load both .wavs to memory, 3. Iterate over the TSV file and insert sounds at the correct sample position. 4. create an audio file from the final result 5. Use FFMPEG to mux the new audio and video together.Alkaloid
And yes with 7 hours of video, I’d expect FFMPEG to take a little bit of time to do the work. I’d be tempted to chop the video up and parallelise the process across a couple of machines for parityAlkaloid
It may be worthwhile adding your script for generating the audio. It sounds as though that is the most pertinent aspect of the process.Alkaloid
AFAIK it's not possible with ffmpeg or sox alone. Use Bash, Python, Julia or whatever else that you are comfortable with (or ready to learn) to manage joining the audio samples according to your input file.Mishmash
Try my suggested method above. Generate a .wav file with the correct length and all samples in the right place. .wav format has a 6.8 hour limitation so a little bit of cleverness may be required (e.g. Two wav files!) I immediately see Julia / Octave / MATLAB being the neatest / quickest ways to try this but realistically use whatever makes sense. (JS actually may be a little bit of a headache unless you are already comfortable with relevant Node.js libraries)Alkaloid
Also, verifying that this works for 7 hours seems silly. Start with a couple minutes, something you can easily verify by hand then go for the full 7 hours. Don't forget to post your script here (and add a language tag)Alkaloid
By “loading in memory” I mean get yourself in the situation where you have two variables, one for each file, that are arrays of the sample data from your wavs.Alkaloid
Exactly. If it gives you trouble , add your new script to the question. If it works for you as an approach, feel free to add the answer. Remember to add a language for whichever language you end up choosingAlkaloid

© 2022 - 2024 — McMap. All rights reserved.