【ML Paper】Explanation of all of YOLO series Part 4
This is an summary of paper that explains yolov1 to v8 in one.
Let's see the history of the yolo with this paper.
This article is part 4, part 3 is here.
Original Paper: https://arxiv.org/pdf/2304.00501
3.2 Computing AP
Average Precision (AP) computation differs between the VOC and COCO datasets due to variations in object categories and evaluation techniques. The VOC dataset comprises 20 object categories, while the COCO dataset includes 80 categories and employs a more sophisticated AP calculation method.
For the VOC dataset, AP is determined by first generating precision-recall curves for each category by adjusting the model's confidence threshold. Each category's AP is then calculated using an interpolated 11-point sampling of these curves. The final AP is the mean of the APs across all 20 categories.
In contrast, the COCO dataset utilizes a 101-point interpolation approach, computing precision at 101 recall thresholds ranging from 0 to 1 in increments of 0.01. Additionally, AP is averaged over multiple Intersection over Union (IoU) values, typically from 0.5 to 0.95 in steps of 0.05. The overall AP is obtained by averaging the AP values across these IoU thresholds.
3.3 Non-Maximum Suppression (NMS)
Non-Maximum Suppression (NMS) is a post-processing technique in object detection algorithms designed to reduce the number of overlapping bounding boxes, thereby enhancing detection quality. Object detection models often generate multiple bounding boxes around the same object, each with different confidence scores. NMS filters out redundant and less relevant bounding boxes, retaining only the most accurate ones.
The NMS algorithm operates by first filtering bounding boxes based on a confidence threshold. The remaining boxes are sorted in descending order of confidence scores. The algorithm then iteratively selects the highest-scoring box and removes any overlapping boxes that have an Intersection over Union (IoU) exceeding a predefined threshold,
Algorithm 1 describes the procedure. Figure 4 shows the typical output of an object detection model containing multiple overlapping bounding boxes and the output after NMS.
Discussion