【ML Paper】YOLOv2: part6
This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.
This article is part 6. Part 5 is here.
Original Paper: https://arxiv.org/abs/1612.08242
Direct Location Prediction
In the context of using anchor boxes with YOLO, a significant challenge arises from model instability, particularly during the initial training iterations.
This instability primarily stems from the prediction of the
Here,
For instance, a prediction of
To mitigate this issue, the approach adopted by YOLO involves predicting location coordinates relative to the grid cell's position.
By bounding the ground truth coordinates between 0 and 1 using a logistic activation function, the network's predictions are constrained within this range. Specifically, the network predicts five bounding boxes per cell in the output feature map, each characterized by five parameters:
Given a cell offset
This constrained parameterization simplifies the learning process, enhancing the network's stability.
Additionally, leveraging dimension clusters and directly predicting the bounding box center locations results in an approximate 5% performance improvement over the traditional anchor box approach.
Fine-Grained Features
The modified YOLO architecture performs detections on a
However, accurately localizing smaller objects benefits from higher-resolution features.
Unlike Faster R-CNN and SSD, which utilize multiple feature maps at varying resolutions for their proposal networks, the enhanced YOLO incorporates a passthrough layer to integrate finer-grained features.
This passthrough layer extracts features from an earlier layer with a
Instead of merging spatial locations, adjacent features are stacked into different channels, analogous to the identity mappings in ResNet. Consequently, the
The detector operates on this augmented feature map, gaining access to detailed information that aids in the localization of smaller objects. This modification yields a modest performance increase of approximately 1%.
Discussion