
【ML】List of Pooling Types


This article is a summary of pooling in machine learning.

1. Max Pooling

Selects the max value in the filter.
Captures the most prominent features and is less sensitive to the exact location of features.

2. Average(Mean) Pooling

Takes the average value in the filter.
Provides a smoother representation, but can lose sharp features.

3. Global Pooling

Apply the pooling to all of values in the map. So the output will be one value per channel.
Typically used at the end of the network to convert a feature map into a single vector, which can be fed into a fully connected layer.

4. Min Pooling

Selects the minimum value in the filter.
Not commonly used, but it can be beneficial in scenarios where the presence of low values is more critical.

5. L2-norm Pooling

Description: Computes the L2 norm in the filter.
・L2-norm Formula
||X||_2 = \sqrt{\sum^n_{i=1} x^2_i}

where x_i are the elements of the vector X.

consider a 2x2 pooling window applied to the following values.

[[1, 2]
 [3, 4]]

The L2-norm of this window is computed as:
\sqrt{1^2 + 2^2 + 3^2 + 4^2} = \sqrt{1 + 4 + 9 + 16} = \sqrt{30} \simeq 5.48

Useful in certain contexts where the magnitude of the feature is important or needs to reduce the sensitivity to outliers compared to max pooling.

6. Mixed Pooling

Combines max pooling and average pooling by taking a weighted average of both.

MixedPool(X) = \alpha \cdot MaxPool(X) + (1 - \alpha) \cdot AvgPool(X)

Can potentially combine the strengths of both max pooling and average pooling.

7. Fractional Max Pooling

Performs pooling over non-integer(fractional) window sizes and strides.
Offers more flexibility and can be useful for more precise downsampling.

8. Adaptive Pooling

Adjusts the pooling operation to output a fixed size, regardless of the input size.
Useful when the network needs to handle variable input sizes, often used in fully convolutional networks.

9. Stochastic Pooling

Selects a value by following the calculated probability.

consider a 2x2 pooling window applied to the following values.

[[1, 2]
 [3, 4]]

Then, probability of each value is \frac{1}{10}, \frac{2}{10}, \frac{3}{10}, \frac{4}{10}. (10 is sum of values)
For instance, the value 4 has a 40% chance of being selected, while the value 1 has a 10% chance.

Good: Adds a regularization effect, reducing overfitting by introducing randomness.
Bad: Compare to basic pooling, unstable(incread variance), complex calculation, sensitive to hyperparameter(window size, stride).

10. Spatial Pyramid Pooling (SPP)

Pools feature maps at multiple levels and combines the results.
Please see this site for more detail information(Incredibly easy to understand).

Allows the network to maintain spatial information at multiple scales, useful for handling varying feature without resizing.


[1] 空間ピラミッドプーリング層 (SPP-net, Spatial Pyramid Pooling) とその応用例や発展型
