I am slowly working on a project which where it would be very useful if the computer could find where in an mp3 file a certain sample occurs. I would restrict this problem to meaning a fairly exact snippet of the audio, not just for example the chorus in a song on a different recording by the same band where it would become more some kind of machine learning problem. Am thinking if it has no noise added and comes from the same file, it should somehow be possible to locate the time at which it occurs without machine learning, just like grep can find the lines in a textfile where a word occurs.
In case you don't have an mp3 lying around, can set up the problem with some music available on the net which is in the public domain, so nobody complains:
curl https://web.archive.org/web/20041019004300/http://www.navyband.navy.mil/anthems/ANTHEMS/United%20Kingdom.mp3 --output godsavethequeen.mp3
It's a minute long:
exiftool godsavethequeen.mp3 | grep Duration
Duration : 0:01:03 (approx)
Now cut out a bit between 30 and 33 seconds (the bit which goes la la la la..):
ffmpeg -ss 30 -to 33 -i godsavethequeen.mp3 gstq_sample.mp3
both files in the folder:
$ ls -la
-rw-r--r-- 1 cardamom cardamom 48736 Jun 23 00:08 gstq_sample.mp3
-rw-r--r-- 1 cardamom cardamom 1007055 Jun 22 23:57 godsavethequeen.mp3
For some reason exiftool seems to overestimate the duration of the sample:
$ exiftool gstq_sample.mp3 | grep Duration
Duration : 6.09 s (approx)
..but I suppose it's only approximate like it tells you.
This is what am after:
$ findsoundsample gstq_sample.mp3 godsavethequeen.mp3
start 30 end 33
Am happy if it is a bash script or a python solution, even using some kind of python library. Sometimes if you use the wrong tool, the solution might work but look horrible, so whichever tool is more suitable. This is a one minute mp3, have not thought yet about performance just about getting it done at all, but would like some scalability, eg find ten seconds somewhere in half an hour.
Have been looking at the following resources as I try to solve this myself:
How to recognize a music sample using Python and Gracenote?
is a good candidate
start 30.0 end 32.999977324263035
Very interesting.. Can see on your profile you know about signal processing and fft. It took 0.7 seconds on my machine, quick.. Will look at the intermediate results in your code on each line. – Joyless