My main goal is in feeding mfcc features to an ANN.
However I am stuck at the data pre processing step and my question has two parts.
BACKGROUND :
I have an audio.
I have a txt file that has the annotation and time stamp like this:
0.0 2.5 Music
2.5 6.05 silence
6.05 8.34 notmusic
8.34 12.0 silence
12.0 15.5 music
I know for a single audio file, I can calculate the mfcc using librosa like this :
import librosa
y, sr = librosa.load('abcd.wav')
mfcc=librosa.feature.mfcc(y=y, sr=sr)
Part 1: I'm unable to wrap my head around two things :
how to calculate mfcc based on the segments from the annotations.
Part2: How to best store these mfcc's for passing them to keras DNN. i.e should all mfcc's calculated per audio segment be saved to a single list/dictionary. or is it better to save them to different dictionaries so that all mfcc's belonging to one label are at one place.
I'm new to audio processing and python so, i'm open to recommendations regarding best practices.
More than happy to provide additional details. Thanks.
(20, 67)
. But for the entire audio . it was(20, 1826)
. It is still unclear how to use the annotated file. – Bracelet