🦥

【PreProcessing Method】How to normalize images

2024/05/17に公開

Python

機械学習

tech

1. Simply Normalization to 0-255

def normalize(data: np.ndarray):
    # Normalize 0-min
    data = data - data.min()
    # Normalize 0-255 int
    data = (data / data.max() * 255).astype(np.uint8)
    
    return data

This simply normalization reducing memory size efficiency, and prevent overfitting to train data.

2. Standardization

import numpy as np

def standardize_array(data: np.ndarray, eps=1e-6):
    # Calculate the mean and standard deviation
    mean = np.mean(data)
    std = np.std(data)
    
    # Add epsilon to the standard deviation to avoid division by zero
    std += eps
    
    # Standardize the array
    standardized_arr = (data - mean) / std
    return standardized_arr

# Example usage
data = np.array([1, 2, 3, 4, 5])
standardized_data = standardize_array(data)
print("Before:", data)
print("Standardized:", standardized_data)
mean = np.mean(standardized_data)
std = np.std(standardized_data)
print('mean: ',mean) # almost 0
print('std: ',std) # almost 1

This is used for normalization in machine laerning pipeline, and it prevent covariate shift(situation the distribution of the input (covariate) is different between training and testing, and it makes model performance bad).

You can also think of it as a stronger normalization.

Summary

Many Normalization methods are existing.
At almost scene, the Normalization makes modelperformance good, please try various methods.

1. Simply Normalization to 0-255

2. Standardization

Summary

Discussion