🍁

【Pre-processing Method】Various ways to visualize audio data

2024/04/22に公開

1. Waveform

Waveform is Most directly depiction of sound.
x-axis represent time, y-axis represent amplitude.

%time
import librosa
import librosa.display
import matplotlib.pyplot as plt

y, sr = librosa.load(librosa.ex('trumpet'))  # Load example audio
plt.figure(figsize=(10, 4))
librosa.display.waveshow(y, sr=sr)
plt.title('Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.show()

・Example

2. Spectrogram

Spectrogram is a visualization of frequency components by dividing waveform data by time and using Fourier transform.
It captures both time and frequency information, this very useful method is used in various scenes. however, depending on the size of the time window, some information may be lost on each side.

%time
import numpy as np

D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
plt.figure(figsize=(10, 4))
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.show()

・Example

3. Mel spectrogram

Mel spectrogram is a type of spectrogram.
It is made easier to understand for huamns, and can be obtained by applying some processing after performing that to create normal spectrogam.

Simply put, it is an enhancement of the low frequency components of the spectrogram.
More details can be found here.

%time

S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_DB = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(10, 4))
librosa.display.specshow(S_DB, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel Spectrogram')
plt.show()

・Example

4. Scalogram

Scalogram is result of applying CWT(Continuing Wavelet Transform).
It provide frequency representation fewer loss than spectrogram with time direction.

More details can be found here.

%time
import pywt

scales = pywt.central_frequency('cmor') / np.linspace(1, 100, 100) * sr
cwtmatr, freqs = pywt.cwt(y, scales, 'cmor', sampling_period=1/sr)
plt.figure(figsize=(10, 4))
plt.imshow(abs(cwtmatr), aspect='auto', extent=[0, len(y) / sr, 1, 100], cmap='jet', origin='lower')
plt.colorbar()
plt.title('Scalogram')
plt.xlabel('Time (s)')
plt.ylabel('Scale')
plt.show()

・Example

5. Chromagram

Chromagram provides 'rougth' freqency information.
It is obtained by this formula.
PC(f) = [12 × log_2 \dfrac{f}{f_{ref}}] mod 12

More details can be found here.

%time

C = librosa.feature.chroma_cqt(y=y, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.specshow(C, sr=sr, x_axis='time', y_axis='chroma', cmap='coolwarm')
plt.colorbar()
plt.title('Chromagram')
plt.show()

・Example

6. Mel-Frequency Cepstral Coefficients (MFCCs)

MFCCs quantity the self-similarity of the high-pass filtered signal at different time scales (musical pitch removed, robust to bandwidth reduction).
It is useful for analysis of the human voice data, etc.

More details can be found here

%time

mfccs = librosa.feature.mfcc(y=y, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.ylabel('MFCC coeffs')
plt.colorbar()
plt.title('MFCC')
plt.show()

・Example

7. Spectral Contrast

Spectral Contrast shows intensity of change in specified frequency range in each time window. It helps recognition of music genre, texture, complexity or more details audio.

More details can be found here

%time

contrast = librosa.feature.spectral_contrast(y=y, sr=sr)
plt.figure(figsize=(10, 4))
librosa.display.specshow(contrast, x_axis='time')
plt.colorbar()
plt.ylabel('Frequency bands')
plt.title('Spectral Contrast')
plt.show()

・Example

Summary

In this time, I summarized some way to visualize wave data.
I'm glad this article will be helpful to someone.

Reference

(1) birdclef-24-data-exploration

Discussion