Split audio files using silence detection
Asked Answered
R

7

31

I've more than 200 MP3 files and I need to split each one of them by using silence detection. I tried Audacity and WavePad but they do not have batch processes and it's very slow to make them one by one.

The scenario is as follows:

  • split track whereas silence 2 seconds or more
  • then add 0.5 s at the start and the end of these tracks and save them as .mp3
  • BitRate 192 stereo
  • normalize volume to be sure that all files are the same volume and quality

I tried FFmpeg but no success.

Reposit answered 5/8, 2017 at 22:46 Comment(2)
Have a look at How can I split a mp3 file?.Ninebark
I've used mp3DirectCut with reasonable success. Having said that, StackOverflow is a Q/A site for programming. It's not a site for requesting recommendations for software or other off-site resources.Skein
R
56

I found pydub to be easiest tool to do this kind of audio manipulation in simple ways and with compact code.

You can install pydub with

pip install pydub

You may need to install ffmpeg/avlib if needed. See this link for more details.

Here is a snippet that does what you asked. Some of the parameters such as silence_threshold and target_dBFS may need some tuning to match your requirements. Overall, I was able to split mp3 files, although I had to try different values for silence_threshold.

Snippet

# Import the AudioSegment class for processing audio and the 
# split_on_silence function for separating out silent chunks.
from pydub import AudioSegment
from pydub.silence import split_on_silence

# Define a function to normalize a chunk to a target amplitude.
def match_target_amplitude(aChunk, target_dBFS):
    ''' Normalize given audio chunk '''
    change_in_dBFS = target_dBFS - aChunk.dBFS
    return aChunk.apply_gain(change_in_dBFS)

# Load your audio.
song = AudioSegment.from_mp3("your_audio.mp3")

# Split track where the silence is 2 seconds or more and get chunks using 
# the imported function.
chunks = split_on_silence (
    # Use the loaded audio.
    song, 
    # Specify that a silent chunk must be at least 2 seconds or 2000 ms long.
    min_silence_len = 2000,
    # Consider a chunk silent if it's quieter than -16 dBFS.
    # (You may want to adjust this parameter.)
    silence_thresh = -16
)

# Process each chunk with your parameters
for i, chunk in enumerate(chunks):
    # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
    silence_chunk = AudioSegment.silent(duration=500)

    # Add the padding chunk to beginning and end of the entire chunk.
    audio_chunk = silence_chunk + chunk + silence_chunk

    # Normalize the entire chunk.
    normalized_chunk = match_target_amplitude(audio_chunk, -20.0)

    # Export the audio chunk with new bitrate.
    print("Exporting chunk{0}.mp3.".format(i))
    normalized_chunk.export(
        ".//chunk{0}.mp3".format(i),
        bitrate = "192k",
        format = "mp3"
    )

If your original audio is stereo (2-channel), your chunks will also be stereo. You can check the original audio like this:

>>> song.channels
2
Rohr answered 1/9, 2017 at 13:53 Comment(8)
Note that split_on_silence() has keep_silence=100 which already includes 200ms of what was detected as silence (100ms at start and and). You could either add only 400ms of silence at beginning and end or do keep_silence=500 to use the silence from the file and avoid adding your own silence.Helluva
Note that this library does not support streaming. i.e., it will attempt to load the whole sound file into memory. In the case of big files in 32bit systems, it may throw memory error. There're other library to consider, like pyAudioAnalysis, though. Also it's tricky to detect silence, especially when it's not completely no sound and it would be hard to tweak the parameters.Easter
@Rohr how to tune min_silence_len and silence_thresh?Chromatin
@AadityaUra - Answer has sample min_silence_len /silence_threshold values. You will need to try different values to see what combination suits your requirements.Rohr
It should, try song= AudioSegment.from_file("your_mp4_audio.mp4", "mp4")Rohr
I Tried but it is still exporting audio file.Lerner
Just as a follow up, I managed to split my speech audio in wav format into somewhat sentences by using the params min_silence_len=1500 and silence_thresh=-30Ruffian
Really had to tinker with these parameters min_silence_len=600, silence_thresh=-55, seek_step=100 worked for me after de-noising the recording twiceLatinist
P
14

You can try using this for splitting audio on silence without the trouble of exploring possibilities for the silence threshold

def split(filepath):
    sound = AudioSegment.from_file(filepath)
    chunks = split_on_silence(
        sound,
        min_silence_len = 500,
        silence_thresh = sound.dBFS - 16,
        keep_silence = 250, # optional
    )

Note that the silence_thresh value need not be adjusted after using this.

Additionally, if you want to split the audio by setting the min length of the audio chunk, you can add this after the above mentioned code.

# minimum chunk length
target_length = 25 * 1000 # 25 seconds

output_chunks = [chunks[0]]
for chunk in chunks[1:]:
    if len(output_chunks[-1]) < target_length:
        output_chunks[-1] += chunk
    else:
        # if the last output chunk
        # is longer than the target length,
        # we can start a new one
        output_chunks.append(chunk)

now we use output_chunks for further processing

Pentameter answered 24/4, 2019 at 10:13 Comment(2)
Just FWIW, can you delete the unused file argument to split? It will save someone else a minute scrunching their eyebrows wondering if that's used somewhere. Thanks for the post!Darin
Why the silence_thresh value need not be adjusted? I find no reason given in your post.Conn
M
9

Having tested all of these solutions and none of them having worked for me I have found a solution that worked for me and is relatively fast.

Prerequisites:

  1. It works with ffmpeg
  2. It is based on code by Vincent Berthiaume from this post (https://mcmap.net/q/470555/-importing-sound-files-into-python-as-numpy-arrays-alternatives-to-audiolab)
  3. It requires numpy (although it doesn't need much from numpy and a solution without numpy would probably be relatively easy to write and further increase speed)

Mode of operation, rationale:

  1. The solutions provided here were based on AI, or were extremely slow, or loaded the entire audio into memory, which was not feasible for my purposes (I wanted to split the recording of all of Bach's Brandenburg Concertos into particular songs, the 2 LPs are 2 hours long, @ 44 kHz 16bit stereo that is 1.4 GB in memory and very slow). From the beginning when I stumbled upon this post I was telling myself that there must be a simple way as this is a mere threshold filter operation which doesn't need much overhead and could be accomplished on tiny chunks of audio at a time. A couple months later I stumbled upon https://mcmap.net/q/470555/-importing-sound-files-into-python-as-numpy-arrays-alternatives-to-audiolab which gave me the idea to accomplish audio splitting relatively efficiently.
  2. The command line arguments give source mp3 (or whatever ffmpeg can read), silence duration and noise threshold value. For my Bach LP recording, 1 second junks of 0.01 of full amplitude did the trick.
  3. It lets ffmpeg convert the input to a lossless 16-bit 22kHz PCM and pass it back via subprocess.Popen, with the advantage that ffmpeg does so very fast and in little chunks which do not occupy much memory.
  4. Back in python, 2 temporary numpy arrays of the last and before last buffer are concatenated and checked if they surpass the given threshold. If they don't, it means there is a block of silence, and (naively I admit) simply count the time where there is "silence". If the time is at least as long as the given min. silence duration, (again naively) the middle of this current interval is taken as the splitting moment.
  5. The program actually doesn't do anything with the source file and instead creates a batch file that can be run that tells ffmpeg to take segments bounded by these "silences" and save them into separate files.
  6. The user can then run the output batch file, maybe filter through some repeating micro intervals with tiny chunks of silence in case there are long pauses between songs.
  7. This solution is both working and fast (none of the other solutions in this thread worked for me).

The little code:

import subprocess as sp
import sys
import numpy

FFMPEG_BIN = "ffmpeg.exe"

print 'ASplit.py <src.mp3> <silence duration in seconds> <threshold amplitude 0.0 .. 1.0>'

src = sys.argv[1]
dur = float(sys.argv[2])
thr = int(float(sys.argv[3]) * 65535)

f = open('%s-out.bat' % src, 'wb')

tmprate = 22050
len2 = dur * tmprate
buflen = int(len2     * 2)
#            t * rate * 16 bits

oarr = numpy.arange(1, dtype='int16')
# just a dummy array for the first chunk

command = [ FFMPEG_BIN,
        '-i', src,
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', str(tmprate), # ouput sampling rate
        '-ac', '1', # '1' for mono
        '-']        # - output to stdout

pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

tf = True
pos = 0
opos = 0
part = 0

while tf :

    raw = pipe.stdout.read(buflen)
    if raw == '' :
        tf = False
        break

    arr = numpy.fromstring(raw, dtype = "int16")

    rng = numpy.concatenate([oarr, arr])
    mx = numpy.amax(rng)
    if mx <= thr :
        # the peak in this range is less than the threshold value
        trng = (rng <= thr) * 1
        # effectively a pass filter with all samples <= thr set to 0 and > thr set to 1
        sm = numpy.sum(trng)
        # i.e. simply (naively) check how many 1's there were
        if sm >= len2 :
            part += 1
            apos = pos + dur * 0.5
            print mx, sm, len2, apos
            f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, apos, src, part))
            opos = apos

    pos += dur

    oarr = arr

part += 1    
f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, pos, src, part))
f.close()
Magically answered 20/7, 2019 at 15:27 Comment(5)
Thanks a lot! After reading in the raw file, I was able to use #24885592 to find the silencesCharcoal
how do you argue that this is performant somehow ? and does not load all the audio file into memory ?Flintlock
Well I didn't put together a precise table of results (I was in a hurry), but the AI based and pydub-based solutions I came across here loaded the entire audio into memory AT ONCE which meant 2 GB of data for my long audio file and took ages just to decode the mp3. The solution I have provided is very fast (on my setup) and only a small part of the audio is loaded at a time, @FlintlockMagically
Hey its not working for me giving this error - drive.google.com/file/d/1VelQaA_hHoeyaBuB5WTNFRIQDPBH2lIs/… Can you please help me out here or if you have time update the working code in your answerLerner
the problem is that numpy.fromstring is deprecated a long ago because of which i guess its showing this error in my laptop.Lerner
C
1

Further to the long answer above. I ended up doing the below in a pinch Then you run it like split.py {input.wav or mp3} 1 .3 where the last two are the minimum length of the silence and the threshold respectively.

This is only tested on windows. Since the Original says ffmpeg.exe. YMMV

It tends to just create chunks of the length of your minimum silence length if the threshold is too high? or low? so you have to play with it and watch the resulting .bat length for clues. shorter is better usually. There are likely better solutions using more modern libraries. I can think of one already but no time right now. This is just a fix for the other one, in modern python, but I'll leave the previous answer up for old python users

import subprocess as sp
import sys
import numpy

FFMPEG_BIN = "ffmpeg.exe"

print ('ASplit.py <src.mp3> <silence duration in seconds> <threshold amplitude 0.0 .. 1.0>')

src = sys.argv[1]
dur = float(sys.argv[2])
thr = int(float(sys.argv[3]) * 65535)

f = open('%s-out.bat' % src, 'wb')

tmprate = 16000
len2 = dur * tmprate
buflen = int(len2     * 2)
#            t * rate * 16 bits

oarr = numpy.arange(1, dtype='int16')
# just a dummy array for the first chunk

command = [ FFMPEG_BIN,
        '-i', src,
        '-f', 's16le',
        '-acodec', 'pcm_s16le',
        '-ar', str(tmprate), # ouput sampling rate
        '-ac', '1', # '1' for mono
        '-']        # - output to stdout

pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

tf = True
pos = 0
opos = 0
part = 0

try:
    while tf:

        raw = pipe.stdout.read(buflen)
        if raw == '':
            tf = False
            break

        arr = numpy.frombuffer(raw, dtype="int16")

        rng = numpy.concatenate([oarr, arr])
        mx = numpy.amax(rng)
        if mx <= thr:
            # the peak in this range is less than the threshold value
            trng = (rng <= thr) * 1

            # effectively a pass filter with all samples <= thr set to 0 and > thr set to 1
            sm = numpy.sum(trng)
            # i.e. simply (naively) check how many 1's there were
            # print(f"sm {sm} len2 {len2}")
            if sm >= len2:
                part += 1
                apos = pos + dur * 0.5
                #print( mx, sm, len2, apos)
                f.write(f'ffmpeg -i "{src}" -ss {opos} -to {apos} -c copy -y "{src}-p{part}.wav"\r\n'.encode() )
                opos = apos

        pos += dur

        oarr = arr

except OSError as err:
    print("OS error: {0}".format(err))
except ValueError:
    print("Could not convert data to an integer.")
except BaseException as err:
    print(f"Unexpected {err}=, {type(err)}=")

part += 1    
f.write(f'ffmpeg -i "{src}" -ss {opos} -to {pos} -c copy -y "{src}-p{part}.wav"\r\n'.encode())
f.close()
Cariole answered 4/7, 2022 at 19:41 Comment(0)
R
0

The following code worked best for me:

from pydub import AudioSegment
from pydub.silence import split_on_silence

def split_audio_by_silence(input_file, silence_threshold=-50, min_silence_duration=500):
    audio = AudioSegment.from_file(input_file)

    # Split the audio based on silence
    segments = split_on_silence(
        audio,
        min_silence_len=min_silence_duration,
        silence_thresh=silence_threshold
    )

    # Export each segment as a separate file
    for i, segment in enumerate(segments, start=1):
        output_file = f"chunk_{i}.mp3"
        segment.export(output_file, format="mp3")

        # Print the start and end time of each chunk
        chunk_start_time = (segment[0].frame_count() / segment.frame_rate) * 1000
        chunk_end_time = (segment[-1].frame_count() / segment.frame_rate) * 1000
        print(f"Chunk {i}: {chunk_start_time}ms to {chunk_end_time}ms")

# Example usage
input_file = "input.mp3"

split_audio_by_silence(input_file)

Remember to have the pydub library installed using pip install pydub before running the code.

Repairman answered 2/6, 2023 at 20:52 Comment(0)
E
0

Adding ARGV Support to Eric Reed's answer..

# Import the AudioSegment class for processing audio and the 
# split_on_silence function for separating out silent chunks.
import sys
from pydub import AudioSegment
from pydub.silence import split_on_silence
if len(sys.argv) < 2:
    #print(round(volume.value_flat, 2)) 
    print("No mp3 file in input.")
    exit(0)
audio_file = str(sys.argv[1])
# Define a function to normalize a chunk to a target amplitude.
def match_target_amplitude(aChunk, target_dBFS):
    ''' Normalize given audio chunk '''
    change_in_dBFS = target_dBFS - aChunk.dBFS
    return aChunk.apply_gain(change_in_dBFS)

# Load your audio.
song = AudioSegment.from_mp3(audio_file)

# Split track where the silence is 2 seconds or more and get chunks using 
# the imported function.
chunks = split_on_silence (
    # Use the loaded audio.
    song, 
    # Specify that a silent chunk must be at least 2 seconds or 2000 ms long.
    min_silence_len = 2000,
    # Consider a chunk silent if it's quieter than -16 dBFS.
    # (You may want to adjust this parameter.)
    silence_thresh = -16
)

# Process each chunk with your parameters
for i, chunk in enumerate(chunks):
    # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
    silence_chunk = AudioSegment.silent(duration=500)

    # Add the padding chunk to beginning and end of the entire chunk.
    audio_chunk = silence_chunk + chunk + silence_chunk

    # Normalize the entire chunk.
    normalized_chunk = match_target_amplitude(audio_chunk, -20.0)

    # Export the audio chunk with new bitrate.
    print("Exporting chunk{0}.mp3.".format(i))
    normalized_chunk.export(
        ".//chunk{0}.mp3".format(i),
        bitrate = "192k",
        format = "mp3"
    )
Esterify answered 21/6, 2023 at 1:4 Comment(0)
F
0

Smart Audio Splitter splits files by silence. Its operates with large audio files with pydub and multiprocessing. https://github.com/Tikhvinskiy/Smart-audio-splitter Its works fine.

Forefront answered 30/7, 2024 at 12:22 Comment(2)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewFulminous
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Keitel

© 2022 - 2025 — McMap. All rights reserved.