🦙

# 【PreProcessing】How to normalize The Mel Spectrogram

2024/08/07に公開

This time, I'll explain how to normalize the Mel spectrogram.

# 1. What is melspectrogram?

Melspectrogram is a kind of spectrogram. The spectrogram is an image for analyzing sequential data from frequency points by using FFT.
I explained about Mel spectrogram here, please reference it if you need.

# 2. Code

``````def normalize_melspec(X, eps=1e-6):
mean = X.mean((1, 2), keepdim=True)
std = X.std((1, 2), keepdim=True)
Xstd = (X - mean) / (std + eps)

norm_min, norm_max = (
Xstd.min(-1)[0].min(-1)[0],
Xstd.max(-1)[0].max(-1)[0],
)
fix_ind = (norm_max - norm_min) > eps * torch.ones_like(
(norm_max - norm_min)
)
V = torch.zeros_like(Xstd)
if fix_ind.sum():
V_fix = Xstd[fix_ind]
norm_max_fix = norm_max[fix_ind, None, None]
norm_min_fix = norm_min[fix_ind, None, None]
V_fix = torch.max(
torch.min(V_fix, norm_max_fix),
norm_min_fix,
)
V_fix = (V_fix - norm_min_fix) / (norm_max_fix - norm_min_fix)
V[fix_ind] = V_fix
return V
``````

Assuming input shape as [batch_size, frequency, time]

`X`: input melspectrogram
`mean = X.mean((1, 2), keepdim=True)`: calculate mean about dimention 1(freq) and 2(time), and maintain the shape.(It is useful for bloadcasting)
`Xstd = (X - mean) / (std + eps)`: This normalization is used to transform the data so that it has a mean of zero and a standard deviation of one. The addition of a small constant ε (epsilon) to the denominator prevents division by zero if the standard deviation is zero.
`norm_min`: The minimum value in Xstd across the last dimension (frequency bins) and then across the second-to-last dimension (time frames).
`norm_max`: The maximum value in Xstd across the last dimension and then across the second-to-last dimension.
`fix_ind`: A boolean mask identifying which spectrograms have a valid range (where the difference between norm_max and norm_min is greater than eps).
`V': An output tensor initialized to zeros with the same shape as Xstd.

If there are any valid spectrograms (`fix_ind.sum()` is greater than 0):
`V_fix`: The subset of Xstd corresponding to the valid spectrograms.
`norm_max_fix`, `norm_min_fix`: The max and min values for the valid spectrograms, reshaped for broadcasting.
`V_fix`: is clamped to the range [norm_min_fix, norm_max_fix].
`V_fix`: is then normalized to the range [0, 1].
The normalized values are assigned back to the appropriate positions in `V`.

The function returns the tensor V, which contains the normalized Mel spectrograms.

# 3. Summary

This function ensures that each Mel spectrogram is normalized to have a range between 0 and 1, which can be helpful for machine learning models to work effectively with the spectrogram data.

This can also be applied to other image data, so please give it a try.