🌾

【Pooling Method】Adaptive Average Pooling explained

2024/05/31に公開

1. Adaptive Average Pooling

Adaptive Average Pooling is a form of average pooling, it provide specify shape output regardress of the input shape.

1.1 How it works

Here's how 2d adaptive average pooling works:

  1. Input
    The input to the AdaptiveAvgPool2d module is a tensor of shape (batch_size, channels, height, width).
  2. Kernel size and stride calculation
    Based on the input size and the target output size, the module automatically calculates the appropriate kernel size and stride to evenly divide the input into the specified number of regions.
  3. Average pooling
    Average pooling is applied to input with nn.AdaptiveAvgPool2d((output_height, output_width)).
    For each region, the module computes the average value of all the elements within that region. This is done independently for each channel.
  4. Output
    The resulting output tensor has the shape (batch_size, channels, output_height, output_width), where output_height and output_width are the specified target output sizes.

1.2 Implementation

Here's an example to illustrate the usage of AdaptiveAvgPool2d:

import torch
import torch.nn as nn

# Create an input tensor
input_tensor = torch.randn(1, 3, 8, 8)  # Batch size: 1, Channels: 3, Height: 8, Width: 8

# Create an AdaptiveAvgPool2d module with target output size (4, 4)
adaptive_avg_pool = nn.AdaptiveAvgPool2d((4, 4))

# Apply adaptive average pooling
output_tensor = adaptive_avg_pool(input_tensor)

print("Input size:", input_tensor.shape)
print("Output size:", output_tensor.shape)

### output
# Input size: torch.Size([1, 3, 8, 8])
# Output size: torch.Size([1, 3, 4, 4])

Of course 1d and 3d adaptive pooling is also existing, those works similar to above.

2. Summary

Adaptive average pooling is commonly used in scenarios where you need to obtain a fixed-size representation of feature maps, regardless of the input size. It is often used in the final layers of convolutional neural networks to reduce the spatial dimensions before feeding the features into fully connected layers or other downstream tasks.

The adaptive nature of the pooling operation makes it flexible and convenient, as it automatically adjusts to different input sizes without requiring manual calculations of kernel sizes and strides.

Discussion