Getting the frequencies associated with STFT in Librosa

Asked 11/8, 2020 at 2:1 Answered 16/4, 2021 at 15:31

Solved python signal-processing fft librosa dft

When using librosa.stft() to calculate a spectrogram, how does one get back the associated frequency values? I am not interested in generating an image as in librosa.display.specshow, but rather I want to have those values in hand.

y, sr = librosa.load('../recordings/high_pitch.m4a')
stft = librosa.stft(y, n_fft=256, window=sig.windows.hamming)
spec = np.abs(stft)

spec gives me the 'amplitude' or 'power' of each frequency, but not the frequencies bins themselves. I have seen that there is a display.specshow function that will display these frequency values on the vertical axis of a heatmap, but not return the values themselves.

I'm looking for something similar to nn.fft.fttfreq() for a single FFT, but cannot find its equivalent in the librosa documentation.

Rocker answered 11/8, 2020 at 2:1 Comment(0)

I would like to point out this question and answer in particular: How do I obtain the frequencies of each value in an FFT?. In addition to consulting the documentation for the STFT from librosa, we know that the horizontal axis is the time axis while the vertical axis are the frequencies. Each column in the spectrogram is the FFT of a slice in time where the centre at this time point has a window placed with n_fft=256 components.

We also know that there is a hop length which tells us how many audio samples we need to skip over before we calculate the next FFT. This by default is n_fft / 4, so every 256 / 4 = 64 points in your audio, we calculate a new FFT centered at this time point of n_fft=256 points long. If you want to know the exact time point each window is centered at, that is simply i / Fs with i being the index of the audio signal which would be a multiple of 64.

Now, for each FFT window, for real signals the spectrum is symmetric so we only consider the positive side of the FFT. This is verified by the documentation where the number of rows and hence the number of frequency components is 1 + n_fft / 2 with 1 being the DC component. Since we have this now, consulting the post above the relationship from bin number to the corresponding frequency is i * Fs / n_fft with i being the bin number, Fs being the sampling frequency and n_fft=256 as the number of points in the FFT window. Since we are only looking at the half spectrum, instead of i spanning from 0 to n_fft, this spans from 0 up to 1 + n_fft / 2 instead as the bins beyond 1 + n_fft / 2 would simply be the reflected version of the half spectrum and so we do not consider the frequency components beyond Fs / 2 Hz.

If you wanted to generate a NumPy array of these frequencies, you could just do:

import numpy as np
freqs = np.arange(0, 1 + n_fft / 2) * Fs / n_fft

freqs would be an array that maps the bin number in the FFT to the corresponding frequency. As an illustrative example, suppose our sampling frequency is 16384 Hz, and n_fft = 256. Therefore:

In [1]: import numpy as np

In [2]: Fs = 16384

In [3]: n_fft = 256

In [4]: np.arange(0, 1 + n_fft / 2) * Fs / n_fft
Out[4]:
array([   0.,   64.,  128.,  192.,  256.,  320.,  384.,  448.,  512.,
        576.,  640.,  704.,  768.,  832.,  896.,  960., 1024., 1088.,
       1152., 1216., 1280., 1344., 1408., 1472., 1536., 1600., 1664.,
       1728., 1792., 1856., 1920., 1984., 2048., 2112., 2176., 2240.,
       2304., 2368., 2432., 2496., 2560., 2624., 2688., 2752., 2816.,
       2880., 2944., 3008., 3072., 3136., 3200., 3264., 3328., 3392.,
       3456., 3520., 3584., 3648., 3712., 3776., 3840., 3904., 3968.,
       4032., 4096., 4160., 4224., 4288., 4352., 4416., 4480., 4544.,
       4608., 4672., 4736., 4800., 4864., 4928., 4992., 5056., 5120.,
       5184., 5248., 5312., 5376., 5440., 5504., 5568., 5632., 5696.,
       5760., 5824., 5888., 5952., 6016., 6080., 6144., 6208., 6272.,
       6336., 6400., 6464., 6528., 6592., 6656., 6720., 6784., 6848.,
       6912., 6976., 7040., 7104., 7168., 7232., 7296., 7360., 7424.,
       7488., 7552., 7616., 7680., 7744., 7808., 7872., 7936., 8000.,
       8064., 8128., 8192.])

In [5]: freqs = _; len(freqs)
Out[5]: 129

We can see that we have generated a 1 + n_fft / 2 = 129 element array which tells us the frequencies for each corresponding bin number.

A word of caution

Take note that librosa.display.specshow has a default sampling rate of 22050 Hz, so if you don't set the sampling rate (sr) to the same sampling frequency as your audio signal, the vertical and horizontal axes will not be correct. Make sure you specify the sr input flag to match your sampling frequency of the incoming audio.

Olmos answered 11/8, 2020 at 4:1 Comment(6)

Thank you Ray! Great answer! – Rocker 11/8, 2020 at 14:47

Great explanation! PS default n_fft in librosa 0.8 is 2048 – Mat 19/2, 2021 at 15:17

I don't understand , I get only a vector, I expected to obtain a vector from every sample in the spectrogram ???, because when you plot the spectrogram you can see that you have multiple frecuencies on each point in time ?? If someone could explain me this it would be great :( – Damiondamita 25/2, 2021 at 15:29

@Damiondamita Please post a new question explaining your issues and errors you are facing. However, if I understand your question correctly this question in particular describes what the associated frequency bins are at each time point in the spectrogram. If you are using the above code in this answer, it just generates those bins. If you want the actual spectrogram you need to run librosa.stft as seen in the question above. – Olmos 25/2, 2021 at 15:38

Can someone explain the "If you want to know the exact time point each window is centered at, that is simply i / Fs", what is Fs in this context? – Newmarket 10/4, 2022 at 14:36

@LukePighetti Fs is the sampling frequency or sampling rate which is the inverse of the sampling interval. Each sound file has a sampling frequency. The sampling frequency is the total number of samples per second in the file. – Olmos 10/4, 2022 at 18:37

In addition to the excellent explanation by rayryeng, it should be noted that the direct equivalent of numpy.fft.fftfreq() in librosa would be librosa.fft_frequencies()

You can use it as follows:

y, sr = librosa.load('../recordings/high_pitch.m4a')
Nfft = 256
stft = librosa.stft(y, n_fft=Nfft, window=sig.windows.hamming)
freqs = librosa.fft_frequencies(sr=sr, n_fft=Nfft)

Brandonbrandt answered 11/8, 2020 at 5:51 Comment(4)

The more you know. Thanks! – Olmos 11/8, 2020 at 8:59

Thank you, that's very handy. – Rocker 11/8, 2020 at 14:47

@Damiondamita it looks like you have a specific problem that may warrant a specific SO question, including a MVCE that would allow us to reproduce your problem and provide an appropriate explanation. – Brandonbrandt 26/2, 2021 at 1:54

You can compute the accumulated energy as follows

samplerate = 48000
Nfft = 8192
freqs = librosa.fft_frequencies(sr=sr, n_fft=Nfft)
plt.loglog(freqs, np.mean(mag**2, axis=1)/(Nfft/2)**2)
plt.xlabel('freq [Hz]')

If you want to sum the energy in a frequency range you can use index mag on freqs, e.g.

np.sum(np.mean(mag[(freqs > 1000) & (freqs < 1480), :]**2, axis=1))/(Nfft/2)**2

More generally you can apply a filter gain(f), the result above is obtained with gain(f) a rectangle.

np.sum(np.mean(mag**2, axis=1)*gain(freq))/(Nfft/2)**2

Disclaimer: I don't know if these scale factors are the correct ones for you. Only the shapes.

Neither answered 16/4, 2021 at 15:31 Comment(1)

Make it clear and mention variable uses for better understanding – Portiaportico 9/3, 2023 at 13:34

A word of caution

Recommended topics

Hot tags