silence out regions of audio based on a list of time stamps , using sox and python

T

2

4

I have an audio file.
I have a bunch of [start, end] time stamp segments.

WHAT I WANT TO ACHIEVE: Say audio is 6:00 minutes long.
Segments I have are : [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]

After I pass these two to sox + python , out put should be audio that is 6 minutes long, but has audio only in the times passed by the segments.

i.e I want to pass the time stamps and original audio to SOX + python so that an audio with everything silenced out except for those portions corresponding to the passed segments is generated

I couldn't achieve above but came somewhat close to the opposite, after days of googling I have this:

UPDATED, MORE CONCISE CODE + EXAMPLE:
sox command that takes padding and trimming like this

SOX__SILENCE = 'sox "{inputaudio}" -c 1 "{outputaudio}" {padding}{trimming}'

Random Segments for testing:

# random segments:
A= [[0.0,16.0]]
b=[[1.0,2.0]]
z= [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]
q= [[0.0,4.0], [8.0,12.0], [16.0,20.0], [24.0,28.0]]

A small python script to generate padding and trimming.

PADDING:

def get_pad_pattern_from_timestamps(my_segments):
        padding = 'pad'
        for segment in my_segments:
            duration = str(segment[1] - segment[0])
            padding = padding + ' ' + duration + '@' + str(segment[0])
        return padding

print get_pad_pattern_from_timestamps(A)
print get_pad_pattern_from_timestamps(b)
print get_pad_pattern_from_timestamps(z)
print get_pad_pattern_from_timestamps(q)

OUTPUT from ^:

pad [email protected]
pad [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
pad [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

TRIMMING:

def get_trimm_pattern_from_timestamps(my_segments):
        trimming = ''
        for segment in my_segments:
            duration = str(segment[1] - segment[0])
            trimming = trimming + ' trim 0 ' + str(segment[0]) + ' 0 ' + duration + ' ' + duration
        return trimming

print get_trimm_pattern_from_timestamps(A)
print get_trimm_pattern_from_timestamps(b)
print("\n")
print get_trimm_pattern_from_timestamps(z)
print("\n")
print get_trimm_pattern_from_timestamps(q)
print("\n")

OUTPUT FROM TRIMMING:

trim 0 0.0 0 16.0 16.0
 trim 0 1.0 0 1.0 1.0


 trim 0 1.6 0 6.7 6.7 trim 0 13.2 0 20.5 20.5 trim 0 35.0 0 3.0 3.0 trim 0 42.0 0 9.0 9.0 trim 0 70.2 0 3.5 3.5 trim 0 90.0 0 9.2 9.2 trim 0 123.0 0 8.1 8.1


 trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0 trim 0 32.0 0 4.0 4.0 trim 0 40.0 0 4.0 4.0

RUNNING SOX using about outputs from a terminal:

Padding:  

    sox dinners.mp3 -c 1 testlongpad.mp3 pad [email protected] [email protected] [email protected] [email protected]

Trimming:  

    sox dinners.mp3 -c 1 testrim.mp3 trim 0 0.0 0 16.0 16.0

Padd and trimm: 

    sox dinners.mp3 -c 1 testlongpadtrim.mp3 pad [email protected] [email protected] [email protected] [email protected] trim 0 0.0 0 4.0 4.0 trim 0 8.0 0 4.0 4.0 trim 0 16.0 0 4.0 4.0 trim 0 24.0 0 4.0 4.0

If S are my segments, then NS is everything else. In ^ approach I'm passing NS , and NS is getting removed from Audio.

What I want to achieve is still the same but in a different way i.e I want to pass S so that only portions of audio corresponding toS are retained.

PS: My question is very specific, i am new to audio processing and unsure how to proceed. Kindly don't close question as being too broad or something. I'd be happy to provide more details to provide clarification. Lastly this is not a hw question. This is for a personal project.

Sample Audio : https://www.dropbox.com/s/1p27nfwney42ka2/LAZY_SALON_-03-_Hot_Dinners.mp3?dl=0

Sample Segments[[start,end],[,] ] : [[1.6, 8.3], [13.2, 33.7], [35.0,38.0], [42.0,51.0], [70.2,73.7], [90.0,99.2], [123.0,131.1]]

So when these time stamps are passed to sox/python with audio, everything in the audio except those portions in the supplied segments should be silenced out.

Tatary answered 9/1, 2018 at 4:45 Comment(4)

Can you provide an audio clip with associated time stamps for extraction or removal? – Azucenaazure 10/1, 2018 at 15:41

@Azucenaazure problem is independent of audio. what ever arbitrary audio and time stamps I pass , the sox should silence out those portions of audio, not present in time stamps supplied. – Tatary 10/1, 2018 at 19:10

A sample makes it easier to answer your question – Azucenaazure 10/1, 2018 at 19:23

Ok got it. updated question and provided audio and time segments. – Tatary 10/1, 2018 at 19:46

T

1

I was able to implement with a workaround.

See : create new list from list of lists in python by grouping

What I did was create a new list containing the regions between segments and then pass it on to sox. At the moment whatever I pass to sox gets removed. So I calculated regions to be removed and then passed it on to sox. It worked pretty well.

Solution is still inverted , but I don't have to change anything in the sox.

I won't accept my answer as an answer. Hoping someone is able to come up with a solution which involves modifying sox commands and not have to recalculate segments like I did.

Tatary answered 11/1, 2018 at 18:40 Comment(0)

A

1

I would probably solve this with a zsh script and awk.

If the times are given like this:

bits

1.6 8.3
13.2 33.7
35.0 38.0
42.0 51.0
70.2 73.7
90.0 99.2
123.0 131.1

Calculate the silence bits like this:

awk '{ print $1, $2, $1 - p; p = $2 }' bits

Output:

1.6 8.3 1.6
13.2 33.7 4.9
35.0 38.0 1.3
42.0 51.0 4
70.2 73.7 19.2
90.0 99.2 16.3
123.0 131.1 23.8

You are now be able to generate the desired command-line with something like this:

args="sox "
m=file.mp3
awk '{ print $1, $2, $1 - p; p = $2 }' bits |
while read s e n; do
  args+="\"|sox -n -p trim 0 $n\" "
  args+="\"|sox $m -p trim $s =$e remix 1\" "
done
args+="out.wav"
echo "$args"

Pipe it into /bin/sh to execute:

... | sh

The output from sox should now be in out.wav.

Azucenaazure answered 14/1, 2018 at 3:33 Comment(0)

Recommended topics

Hot tags