【PreProcessing】How to normalize The Mel Spectrogram
This time, I'll explain how to normalize the Mel spectrogram.
1. What is melspectrogram?
Melspectrogram is a kind of spectrogram. The spectrogram is an image for analyzing sequential data from frequency points by using FFT.
I explained about Mel spectrogram here, please reference it if you need.
2. Code
The main focus of this article is normalization to Mel spectrogram.
def normalize_melspec(X, eps=1e-6):
mean = X.mean((1, 2), keepdim=True)
std = X.std((1, 2), keepdim=True)
Xstd = (X - mean) / (std + eps)
norm_min, norm_max = (
Xstd.min(-1)[0].min(-1)[0],
Xstd.max(-1)[0].max(-1)[0],
)
fix_ind = (norm_max - norm_min) > eps * torch.ones_like(
(norm_max - norm_min)
)
V = torch.zeros_like(Xstd)
if fix_ind.sum():
V_fix = Xstd[fix_ind]
norm_max_fix = norm_max[fix_ind, None, None]
norm_min_fix = norm_min[fix_ind, None, None]
V_fix = torch.max(
torch.min(V_fix, norm_max_fix),
norm_min_fix,
)
V_fix = (V_fix - norm_min_fix) / (norm_max_fix - norm_min_fix)
V[fix_ind] = V_fix
return V
Assuming input shape as [batch_size, frequency, time]
X
: input melspectrogram
mean = X.mean((1, 2), keepdim=True)
: calculate mean about dimention 1(freq) and 2(time), and maintain the shape.(It is useful for bloadcasting)
Xstd = (X - mean) / (std + eps)
: This normalization is used to transform the data so that it has a mean of zero and a standard deviation of one. The addition of a small constant ε (epsilon) to the denominator prevents division by zero if the standard deviation is zero.
norm_min
: The minimum value in Xstd across the last dimension (frequency bins) and then across the second-to-last dimension (time frames).
norm_max
: The maximum value in Xstd across the last dimension and then across the second-to-last dimension.
fix_ind
: A boolean mask identifying which spectrograms have a valid range (where the difference between norm_max and norm_min is greater than eps).
`V': An output tensor initialized to zeros with the same shape as Xstd.
If there are any valid spectrograms (fix_ind.sum()
is greater than 0):
V_fix
: The subset of Xstd corresponding to the valid spectrograms.
norm_max_fix
, norm_min_fix
: The max and min values for the valid spectrograms, reshaped for broadcasting.
V_fix
: is clamped to the range [norm_min_fix, norm_max_fix].
V_fix
: is then normalized to the range [0, 1].
The normalized values are assigned back to the appropriate positions in V
.
The function returns the tensor V, which contains the normalized Mel spectrograms.
3. Summary
This function ensures that each Mel spectrogram is normalized to have a range between 0 and 1, which can be helpful for machine learning models to work effectively with the spectrogram data.
This can also be applied to other image data, so please give it a try.
Discussion