I'm working with the librosa library, and
I would like to know what information is returned by the librosa.load
function when I read a audio (.wav) file.
Is it the instantaneous sound pressure in pa, or the just the instantaneous amplitude of the sound signal with no units?
To confirm the previous answer, librosa.load returns a time series that in librosa glossary is defined as:
"time series:
Typically an audio signal, denoted by y, and represented as a one-dimensional numpy.ndarray
of floating-point values. y[t] corresponds to the amplitude of the waveform at sample t."
The amplitude is usually measured as a function of the change in pressure around the microphone or receiver device that originally picked up the audio. (See more here).
According to my knowledge, the amplitude is the measurement of the change in atmospheric pressure while recording. According to librosa.load
documentation here, this method returns two things:
The sample rate
sr
: which means how many samples are recorded per second.A 2D array:
- The first axis: represents the recorded samples of amplitudes (change of air pressure) in the audio.
- The second axis: represents the number of channels in the audio.
Here is an example from the official documentation:
>>> import librosa
>>> filename = librosa.util.example_audio_file()
>>> y, sr = librosa.load(filename)
>>> sr #sample rate
22050
>>> y.shape #mono (1 channel)
(1355168,)
>> y.shape[0] / sr #duration of audio file in seconds
61.45886621315193
As we can see:
- The sample rate is
22050
which means that the recorder was recording22050
times per second. - The
y.shape = (1355168,)
which means that there were1355168
samples recorded on just one channel (Mono) over the whole audio. - Using simple math, you can calculate the duration of this audio file by dividing the
total_number_of_samples
over thesample_rate
Added from comments
Do note that if you read the file as y, sr = librosa.load(filename)
, librosa will resample the signal to 22050
Hz by default. As stated in the documentation, if you want to get the native sampling rate, you should read the signal as y, sr = librosa.load(filename, sr=None)
.
y, sr = librosa.load(filename)
, librosa
will resample the signal to 22050 Hz by default. As stated in the documentation, if you want to get the native sampling rate, you should read the signal as y, sr = librosa.load(filename, sr=None)
–
Beeline To confirm the previous answer, librosa.load returns a time series that in librosa glossary is defined as:
"time series:
Typically an audio signal, denoted by y, and represented as a one-dimensional numpy.ndarray
of floating-point values. y[t] corresponds to the amplitude of the waveform at sample t."
The amplitude is usually measured as a function of the change in pressure around the microphone or receiver device that originally picked up the audio. (See more here).
To add to the above answer, you may also use librosa function
librosa.get_duration(y,sr)
to get the duration of the audio file in seconds.
Or you may use len(y)/sr
to get the audio file duration in seconds
© 2022 - 2024 — McMap. All rights reserved.