【ML Paper】YOLOv2: part7
This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.
This article is part 7. Part 6 is here.
Original Paper: https://arxiv.org/abs/1612.08242
Multi-Scale Training
YOLOv2 improves robustness to varying image sizes through a multi-scale training approach. Unlike the original YOLO with a fixed input resolution of
Utilizing only convolutional and pooling layers allows the model to resize dynamically during training. Every 10 batches, the network randomly selects a new image dimension from multiples of 32 within the range of
At lower resolutions, such as
At higher resolutions, YOLOv2 reaches a state-of-the-art mAP of 78.6 on the PASCAL VOC 2007 dataset while maintaining real-time processing speeds. This flexibility allows users to balance speed and accuracy based on their specific needs.
Further Experiments
YOLOv2's performance was further tested on additional datasets. When trained on the PASCAL VOC 2012 dataset, YOLOv2 achieved a mAP of 73.4, outperforming other detection systems in speed. On the COCO dataset, YOLOv2 reached a mAP of 44.0 at an Intersection over Union (IOU) threshold of 0.5, matching the performance of SSD and Faster R-CNN models.
Table below compares YOLOv2's performance with other detection frameworks on the PASCAL VOC 2007 dataset:
Detection Framework | Train | mAP | FPS |
---|---|---|---|
Fast R-CNN | 2007+2012 | 70.0 | 0.5 |
Faster R-CNN VGG-16 | 2007+2012 | 73.2 | 7 |
Faster R-CNN ResNet | 2007+2012 | 76.4 | 5 |
YOLO | 2007+2012 | 63.4 | 45 |
SSD300 | 2007+2012 | 74.3 | 46 |
SSD500 | 2007+2012 | 76.8 | 19 |
YOLOv2 288 × 288 | 2007+2012 | 69.0 | 91 |
YOLOv2 352 × 352 | 2007+2012 | 73.7 | 81 |
YOLOv2 416 × 416 | 2007+2012 | 76.8 | 67 |
YOLOv2 480 × 480 | 2007+2012 | 77.8 | 59 |
YOLOv2 544 × 544 | 2007+2012 | 78.6 | 40 |
All metrics are measured on a Geforce GTX Titan X GPU (original model, not Pascal). YOLOv2 consistently outperforms previous detection methods in both speed and accuracy.
Its ability to operate at different resolutions with the same trained model allows YOLOv2 to maintain high mAP scores while running in real-time, making it versatile for applications like real-time video processing and deployment on resource-constrained hardware.
Discussion