【ML Paper】Explanation of all of YOLO series Part 7
This is an summary of paper that explains yolov1 to v8 in one.
Let's see the history of the yolo with this paper.
This article is part 7, part 6 is here.
Original Paper: https://arxiv.org/pdf/2304.00501
4.3 YOLOv1 Training
Pre-training and Fine-tuning
The authors initially pre-trained the first 20 layers of YOLOv1 at a resolution of
Data Augmentation
To improve the model's robustness and generalization, several data augmentation techniques were employed. These included random scaling and translations of up to 20% of the input image size. Additionally, the authors applied random exposure and saturation adjustments with an upper-end factor of 1.5 in the HSV color space. These augmentations helped the model become invariant to variations in object size, position, and lighting conditions.
Loss Function
YOLOv1 utilized a composite loss function composed of multiple sum-squared errors, designed to optimize different aspects of object detection:
-
Localization Loss: The first two terms of the loss function account for the errors in the predicted bounding box locations (
,x ) and sizes (y ,w ). These errors are calculated only for boxes that contain objects, as indicated by the presence of an object in the corresponding grid cell (h ). A scale factor of1_{\text{obj}_{ij}} is applied to emphasize the importance of accurate bounding box predictions.\lambda_{\text{coord}} = 5 -
Confidence Loss: The third and fourth terms measure the confidence scores of the bounding boxes. The third term evaluates the confidence error for boxes that contain objects (
), while the fourth term assesses the confidence error for boxes that do not contain objects (1_{\text{obj}_{ij}} ). To account for the majority of boxes being empty, the confidence loss for non-object boxes is scaled down by a factor of1_{\text{noobj}_{ij}} .\lambda_{\text{noobj}} = 0.5 -
Classification Loss: The final component of the loss function measures the squared error of the class conditional probabilities for each class, but only for grid cells that contain objects (
). This ensures that the model focuses on accurately classifying objects where they are present.1_{\text{obj}_{i}}
By integrating these loss components, YOLOv1 effectively balances the need for precise localization, confident detection, and accurate classification, while mitigating the impact of numerous empty bounding boxes during training.
Discussion