Python: performing FFT on music file
Asked Answered
R

2

7

I am trying to perform a FFT on a song (audio file in wav format, about 3 minutes long) which I created as follows, just in case it is relevant.

ffmpeg -i "$1" -vn -ab 128k -ar 44100 -y -ac 1 "${1%.webm}.wav"

Where $1 is the name of a webm file.

This is the code which is supposed to display a FFT of the given file:

import numpy as np
import matplotlib.pyplot as plt

# presume file already converted to wav.
file = os.path.join(temp_folder, file_name)

rate, aud_data = scipy.io.wavfile.read(file)

# wav file is mono.
channel_1 = aud_data[:]

fourier = np.fft.fft(channel_1)

plt.figure(1)
plt.plot(fourier)
plt.xlabel('n')
plt.ylabel('amplitude')
plt.show()

The problem is, it takes for ever. It takes so long that I cannot even show the output, since I've had plenty of time to research and write this post and it still has not finished.

I presume that the file is too long, since

print (aud_data.shape)

outputs (9218368,), but this looks like a real world problem, so I hope there is a way to obtain an FFT of an audio file somehow.

What am I doing wrong? Thank you.

edit

A better formulation of the question would be: is the FFT of any good in music processing? For example similarity of 2 pieces.

As pointed out in the comments, my plain approach is way too slow.

Thank you.

Rani answered 26/12, 2017 at 19:15 Comment(10)
Possible duplicate of plotting spectrogram in audio analysisSclater
Well, FFT runs in O(nlogn) time, and you have ~10^8 samples, so yeah, what did you expect?Kileykilgore
@Pavel. Closely related, but not quite the same. I don't think a dupe vote is warranted here.Kileykilgore
@Physicist makes sense. So using FFT on music is not a thing? I figured it would be widely used for finding similar songs for instance.Rani
Since the power spectrum of music is time-varying you would typically use a STFT to take the FFT of successive (usually overlapping) short windows, where any given window can be considered to be statistically stationary (10 ms to 20 ms is typical for window size). The resulting 3D data structure is called a spectrogram and has a wide range of applications.Dibru
using an exact power of 2 length will give best speed results with fft, apparently there's also GPU accelerated fft: google.com/search?q=python+cuda&oq=python+cudaHallucinogen
@PaulR thank you for your answer. I'll have a closer look at spectograms. If you post an answer I'll accept it. However, how can a bunch of repeated FFTs be this much faster than one FFT on the entire input? I don't understand.Rani
Two reasons: (i) FFT is O(n log n) - if you do the math then you will see that a number of small FFTs is more efficient than one large one; (ii) smaller FFTs are typically much more cache-friendly - the FFT makes log2(n) passes through the data, with a somewhat “random” access pattern, so it can make a huge difference if your n data points all fit in cache.Dibru
See also: this answer and this answer.Dibru
@Rani It might be a good idea to separate out the fft portion and the plotting portion. Plotting that many points will probably be incredibly slow even if the FFT is lightning fast.Cuticula
C
10

To considerably speed up the fft portion of your analysis, you can zero-pad out your data to a power of 2:

import scipy
import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate, aud_data = 44000, np.random.random((9218368,))

len_data = len(aud_data)

channel_1 = np.zeros([2**(int(np.ceil(np.log2(len_data)))),1])
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)

Here is an example of plotting the real component of the fourier transform of a few sine waves using the above method:

import numpy as np
import matplotlib.pyplot as plt

# rate, aud_data = scipy.io.wavfile.read(file)
rate = 44000
ii = np.arange(0, 9218368)
t = ii / rate
aud_data = np.zeros(len(t))
for w in [1000, 5000, 10000, 15000]:
    aud_data += np.cos(2 * np.pi * w * t)

# From here down, everything else can be the same
len_data = len(aud_data)

channel_1 = np.zeros(2**(int(np.ceil(np.log2(len_data)))))
channel_1[0:len_data] = aud_data

fourier = np.fft.fft(channel_1)
w = np.linspace(0, 44000, len(fourier))

# First half is the real component, second half is imaginary
fourier_to_plot = fourier[0:len(fourier)//2]
w = w[0:len(fourier)//2]

plt.figure(1)

plt.plot(w, fourier_to_plot)
plt.xlabel('frequency')
plt.ylabel('amplitude')
plt.show()
Cuticula answered 28/12, 2017 at 22:39 Comment(1)
The second input argument to np.fft.fft is the transform length. Set that to your computed power of two length, and you don't need to explicitly zero-pad the data.Cruciate
C
2

The FFT is very useful in music, but not very useful when applied to a whole song. Typically a song is not so repetitive that lumping it all together would make sense.

About the computation time: 9218368 = 26 x 144037. That large prime factor makes the FFT very slow. Padding it or cropping it so that the length is a product of smaller integers will make it much faster (10 million points is really not all that much!).

Note that the length doesn't need to be a power of two to be efficient, multiples of 3, 5 or even 7 can still be quite efficient. For example use scipy.fft.next_fast_len() to find a good length to pad to.

Cruciate answered 19/7, 2023 at 15:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.