【ML Paper】YOLO: Unified Real-Time Object Detection part2
This time, I'll explain the YOLO image detection model with paper.
This is a part2, part3 will publish soon.
Original paper: https://arxiv.org/abs/1506.02640
3. Unified detection
3.1 Introduction to Unified Object Detection
In this approach, we unify various components of object detection into a single neural network. The network predicts bounding boxes and class probabilities for the entire image simultaneously. This unified system allows for global reasoning about all objects in an image and enables end-to-end training with real-time speeds, while maintaining high precision in object detection.
3.2 Image Grid Division and Responsibility for Detection
The input image is divided into an
3.3 Bounding Box Predictions
Each bounding box prediction contains five values:
-
andx coordinates, which represent the center of the box relative to the grid cell,y -
(width) andw (height), relative to the entire image,h - A confidence score, which is the product of the probability of an object being present and the Intersection Over Union (IOU) between the predicted and actual bounding boxes.
3.4 Class Probability Predictions
Each grid cell predicts class probabilities conditioned on the presence of an object. Regardless of the number of bounding boxes, each grid cell predicts only one set of class probabilities, reflecting the likelihood of different classes within that cell.
3.5 Final Prediction
During testing, the system multiplies the class probabilities with the confidence of the bounding boxes to compute class-specific confidence scores. These scores encode both the probability of a specific class appearing in the box and how well the predicted box matches the object.
3.6 Model Structure and Tensor Representation
The model treats object detection as a regression problem. The image is divided into an
3.7 Example Application (PASCAL VOC)
For evaluation on the PASCAL VOC dataset, the parameters are set as
Discussion