generate mfcc's for audio segments based on annotated file
Asked Answered
B

1

2

My main goal is in feeding mfcc features to an ANN.

However I am stuck at the data pre processing step and my question has two parts.
BACKGROUND :
I have an audio. I have a txt file that has the annotation and time stamp like this:

0.0 2.5 Music  
2.5 6.05 silence  
6.05 8.34 notmusic  
8.34  12.0  silence  
12.0  15.5 music  

I know for a single audio file, I can calculate the mfcc using librosa like this :

import librosa
y, sr = librosa.load('abcd.wav')
mfcc=librosa.feature.mfcc(y=y, sr=sr)

Part 1: I'm unable to wrap my head around two things :
how to calculate mfcc based on the segments from the annotations.

Part2: How to best store these mfcc's for passing them to keras DNN. i.e should all mfcc's calculated per audio segment be saved to a single list/dictionary. or is it better to save them to different dictionaries so that all mfcc's belonging to one label are at one place.

I'm new to audio processing and python so, i'm open to recommendations regarding best practices.

More than happy to provide additional details. Thanks.

Bracelet answered 19/1, 2018 at 3:2 Comment(0)
J
4

Part 1: MFCC to tag conversion

It's not obvious from the librosa documentation but I believe the mfcc's are being calculated at about a 23mS frame rate. With your code above mfcc.shape will return (20, x) where 20 is the number of features and the x corresponds to x number of frames. The default hop_rate for mfcc is 512 samples which means each mfcc sample spans about 23mS (512/sr).

Using this you can compute which frame goes with which tag in your text file. For example, the tag Music goes from 0.0 to 2.5 seconds so that will be mfcc frame 0 to 2.5*sr/512 ~= 108. They will not come out exactly equal so you need to round the values.

Part 2A: DNN Data Format

For the input (mfcc data) you'll need to figure out what the input looks like. You'll have 20 features but do you want to input a single frame to your net or are you going to submit a time series. You're mfcc data is already a numpy array, however it's formatted as (feature, sample). You probably want to reverse that for input to Keras. You can use numpy.reshape to do that.

For the output, you need assign a numeric value to each tag in your text file. Typically you would store the the tag to integer in a dictionary. This will then be used to create your training output for the network. There should be one output integer for each input sample.

Part 2B: Saving the Data

The simplest way to do this is to use pickle to save and the reload it later. I like to use a class to encapsulate the input, output and dictionary data but you can choose whatever works for you.

Jeffers answered 19/1, 2018 at 5:13 Comment(7)
thanks for you response. I checked the shape of my mfcc for 1second sample, you are right , it was like (20, 67) . But for the entire audio . it was (20, 1826). It is still unclear how to use the annotated file.Bracelet
For the annotated file your steps are... 1-enumated the different tag names. 2-Use the time stamps in the file to assign a tag's enumerated value to each mfcc sample. This means that if you have 1826 mfcc samples, you'll have 1826 enumerated values as the output of your network.Jeffers
In other words, I have generated the mfcc's, I have the annotations. how I did overlay the two together, so that I can feed it to the neural network like this kaggle.com/louisong97/neural-network-approach-to-iris-datasetBracelet
You'll do something that looks a bit like... x_train=mfcc.reshape(xx) y_train=keras.utils.to_categorical(annotations, num_classes=xx) model.fit(X_train,y_train, ....) Spend some effort trying to code this up and if you get to a point where you can't get past the errors, post those, along with your code in a new question.Jeffers
@Jeffers the tag Music goes from 0.0 to 2.5 seconds so that will be mfcc frame 0 to 2.5*sr/512 ~= 108. They will not come out exactly equal so you need to round the values. sorry I didn't understand what is meant by this..... I am trying to solve very similar question : #48389141 if you could please shed light or assist with a small code snippet/pseudocode etc on how to accomplish it. thank you.Carola
Note that I posted code for the above comment at #48389141Jeffers
@Jeffers thanks for your answer, I made progress over last few weeks. Now I have the training data, and model but unable to connect the two. If possible please take a look at my new question here : #48514822 thanksBracelet

© 2022 - 2024 — McMap. All rights reserved.