🦏

【ML Paper】YOLOv2: part5

2024/11/06に公開

This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.

This article is part 5. Part 4 is here.

Original Paper: https://arxiv.org/abs/1612.08242

3.6 Dimension Clusters

To address the challenges associated with manually selecting anchor box dimensions in YOLO, the authors propose an automated approach using k-means clustering on the training set's bounding boxes.
Unlike standard k-means, which employs Euclidean distance and tends to favor smaller boxes by generating larger errors for bigger boxes, the authors introduce a distance metric based on Intersection over Union (IOU).

Specifically, the distance between a box and a centroid is defined as:

d(\text{box}, \text{centroid}) = 1 - \text{IOU}(\text{box}, \text{centroid})

This metric ensures that the selection of priors focuses on achieving high IOU scores irrespective of box sizes. The clustering process is executed for various values of ( k ), and the average IOU with the closest centroid is evaluated to determine the optimal number of clusters.

3.6.1 Results

The experiments reveal that selecting ( k = 5 ) offers a balanced tradeoff between model complexity and recall, achieving an average IOU of 61.0. This performance is comparable to using 9 hand-picked anchor boxes, which yield an average IOU of 60.9.
Moreover, increasing the number of centroids to ( k = 9 ) significantly enhances the average IOU to 67.2, indicating a superior alignment with the bounding box distribution.

The cluster centroids derived from this method differ markedly from the manually selected anchor boxes, characterized by fewer short and wide boxes and a greater prevalence of tall and thin boxes.

3.6.2 Discussion

The utilization of k-means clustering with an IOU-based distance metric provides a more effective initialization for anchor boxes in YOLO.
This approach not only simplifies the learning process for the network by offering a better starting representation but also enhances the overall detection performance, as evidenced by the higher IOU scores with increased ( k ).

3.6.3 Limitations

While the clustering approach shows promise, the choice of ( k ) remains a hyperparameter that requires careful tuning. Additionally, the method's effectiveness may vary with different datasets, necessitating further validation across diverse scenarios.


The table illustrates that clustering-based methods, particularly with ( k = 9 ), achieve higher average IOU compared to traditional hand-picked anchor boxes, highlighting the efficacy of the proposed approach.

Discussion