What is the conceptual purpose of librosa.amplitude_to_db?
Asked Answered
N

1

7

I'm using the librosa library to get and filter spectrograms from audio data.

I mostly understand the math behind generating a spectrogram:

  1. Get signal
  2. window signal
  3. for each window compute Fourier transform
  4. Create matrix whose columns are the transforms
  5. Plot heat map of this matrix

So that's really easy with librosa:

spec = np.abs(librosa.stft(signal, n_fft=len(window), window=window)

Yay! I've got my matrix of FFTs. Now I see this function librosa.amplitude_to_db and I think this is where my ignorance of signal processing starts to show. Here is a snippet I found on Medium:

spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)

Why does the author use this amplitude_to_db function? Why not just plot the output of the STFT directly?

Notch answered 10/8, 2020 at 21:2 Comment(1)
Log scale just looks nicer on the graph. Linear is not informative, every spike will destroy your graphSweetener
D
13

The range of perceivable sound pressure is very wide, from around 20 μPa (micro Pascal) to 20 Pa, a ratio of 1 million. Furthermore the human perception of sound levels is not linear, but better approximated by a logarithm.

By converting to decibels (dB) the scale becomes logarithmic. This limits the numerical range, to something like 0-120 dB instead. The intensity of colors when this is plotted corresponds more closely to what we hear than if one used a linear scale.

Note that the reference (0 dB) point in decibels can be chosen freely. The default for librosa.amplitude_to_db is to compute numpy.max, meaning that the max value of the input will be mapped to 0 dB. All other values will then be negative. The function also applies a threshold on the range of sounds, by default 80 dB. So anything lower than -80 dB will be clipped -80 dB.

Dappled answered 11/8, 2020 at 7:2 Comment(5)
Great. I had noticed the mapping to negative values, and I'm familiar with plotting on a logarithmic scale to save space, but I was confused by the dB moniker. Your answer explained a lot. Thank you.Notch
Is there a specific name for this kind of data's plot (where amplitude is expressed in dB)? Like for example, the linear amplitude vs time plot is often called oscillogram (since it emulates the output of the oscilloscope).Baksheesh
"Soundlevel" is typically expressed in dB. A standardized form is Sound Pressure Level (SPL), but that requires that the dB values are actual physical pressure. In digital audio dBFS is sometimes used as the unit, a form where 0 dB is the maximum representable value ("full scale" = FS).Dappled
Piggybacking on this: librosa power_to_db() allows the user to specify the reference power for the dB ratio, e.g., np.max or some fixed value, where dB = 10*log10(S/ref). That option is not available in scipy.signal.spectrogram: there is no ability to specify the reference. What is the reference power (denominator) in the scipy decibel scaling? Similarly, the matplotlib specgram documentation simply specifies the dB scaling as 10*log10, but not the ratio over which that is computed. I.e., what reference does 0dB refer to in a scipy or matplotlib spectrogram?Laryngology
Please ask in a new SO question Darren! Feel free to link it hereDappled

© 2022 - 2024 — McMap. All rights reserved.