🦏

【ML Paper】YOLOv2: part9

2024/11/10に公開

This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.

This article is part 9. Part 8 is here.

Original Paper: https://arxiv.org/abs/1612.08242

Classification Training

The network was trained on the standard ImageNet 1000-class classification dataset for 160 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.1.
A polynomial learning rate decay with a power of 4, weight decay of 0.0005, and momentum of 0.9 were applied within the Darknet neural network framework. Standard data augmentation techniques, including random crops, rotations, and hue, saturation, and exposure shifts, were utilized during training.

After the initial training phase, the network was fine-tuned at a larger resolution of 448 \times 448 for an additional 10 epochs with a reduced learning rate of 10^{-3}, maintaining the same weight decay and momentum parameters.

Detection Training

For detection tasks, the network architecture was modified by removing the last convolutional layer and adding three 3 \times 3 convolutional layers with 1024 filters each, followed by a final 1 \times 1 convolutional layer tailored to the detection requirements.
Specifically, for the VOC dataset, the model was configured to predict 5 bounding boxes per image, each with 5 coordinates and 20 class probabilities, resulting in 125 filters.
Additionally, a passthrough layer was integrated from the final 3 \times 3 \times 512 layer to the second-to-last convolutional layer to leverage fine-grained features. The detection network was trained for 160 epochs starting with a learning rate of 10^{-3}, which was reduced by a factor of 10 at the 60th and 90th epochs.

Discussion