I have two stereo sounds, 1.wav and 2.wav, these sounds are less than 1 second long and list of timestamps (miliseconds from start of recording). Recording of pure video (recording.mp4) is several hours long and there are thousands (20 000 - 30 000) of timestamps per sounds.
I want to convert list of timestamps and sounds into one recording, merging it with video. The part of merging audio with video is easy with ffmpeg, so this is not part of the question.
The list of timestamps is tsv, for example:
1201\t1.wav
1501\t2.wav 1603\t1.wav
and so on, up to 50 000
I can convert it to anything, I am generating this file.
I have seen mixing sound with padding and mixing audio to existing video, but I have to batch process a lots of samples, running sox that many times is not feasible. Mere constructing input for ffmpeg or sox is a cumbersome task.
sox -M f2.wav f3.wav f1.wav out.wav delay 4 4 8 8 remix 1,3,5 2,4,6
(assuming stereo), or
sox -m f1.wav "|sox f2.wav -p pad 4" "|sox f3.wav -p pad 8" out.wav
Cool for three files. Not feasible for 50 000+. First one needs to read file multiple times (even if it is the same one) and remix channels. Second executes 50 000 sox invocations, also reading the same two files (1.wav, 2.wav) over and over.
I do not use any effects on sounds. There is no explicit support in sox to take one input and play it multiple times (echo / echos destroys the material). Also creating padding or delay takes a lot of time. FFMPEG also needs long query to make it happen.
Since muxing two files is easy, I have tried to record two sounds separately, but still it takes a lot of time to process.
Is there simpler / faster way?
Taking advice from fdcpp, since wav is PCM coded I also consider writing C program to parse it. I will update code, when I am done.
This extends question: is there way to encode offsets in wav format?
.wav
s to memory, 3. Iterate over the TSV file and insert sounds at the correct sample position. 4. create an audio file from the final result 5. Use FFMPEG to mux the new audio and video together. – Alkaloidffmpeg
orsox
alone. Use Bash, Python, Julia or whatever else that you are comfortable with (or ready to learn) to manage joining the audio samples according to your input file. – Mishmash.wav
file with the correct length and all samples in the right place..wav
format has a 6.8 hour limitation so a little bit of cleverness may be required (e.g. Two wav files!) I immediately see Julia / Octave / MATLAB being the neatest / quickest ways to try this but realistically use whatever makes sense. (JS actually may be a little bit of a headache unless you are already comfortable with relevant Node.js libraries) – Alkaloid