Zenn
🦏

【ML Paper】YOLOv2: part5

に公開

This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.

This article is part 5. Part 4 is here.

Original Paper: https://arxiv.org/abs/1612.08242

3.6 Dimension Clusters

To address the challenges associated with manually selecting anchor box dimensions in YOLO, the authors propose an automated approach using k-means clustering on the training set's bounding boxes.
Unlike standard k-means, which employs Euclidean distance and tends to favor smaller boxes by generating larger errors for bigger boxes, the authors introduce a distance metric based on Intersection over Union (IOU).

Specifically, the distance between a box and a centroid is defined as:

d(box,centroid)=1IOU(box,centroid)d(\text{box}, \text{centroid}) = 1 - \text{IOU}(\text{box}, \text{centroid})

This metric ensures that the selection of priors focuses on achieving high IOU scores irrespective of box sizes. The clustering process is executed for various values of ( k ), and the average IOU with the closest centroid is evaluated to determine the optimal number of clusters.

3.6.1 Results

The experiments reveal that selecting ( k = 5 ) offers a balanced tradeoff between model complexity and recall, achieving an average IOU of 61.0. This performance is comparable to using 9 hand-picked anchor boxes, which yield an average IOU of 60.9.
Moreover, increasing the number of centroids to ( k = 9 ) significantly enhances the average IOU to 67.2, indicating a superior alignment with the bounding box distribution.

The cluster centroids derived from this method differ markedly from the manually selected anchor boxes, characterized by fewer short and wide boxes and a greater prevalence of tall and thin boxes.

3.6.2 Discussion

The utilization of k-means clustering with an IOU-based distance metric provides a more effective initialization for anchor boxes in YOLO.
This approach not only simplifies the learning process for the network by offering a better starting representation but also enhances the overall detection performance, as evidenced by the higher IOU scores with increased ( k ).

3.6.3 Limitations

While the clustering approach shows promise, the choice of ( k ) remains a hyperparameter that requires careful tuning. Additionally, the method's effectiveness may vary with different datasets, necessitating further validation across diverse scenarios.


The table illustrates that clustering-based methods, particularly with ( k = 9 ), achieve higher average IOU compared to traditional hand-picked anchor boxes, highlighting the efficacy of the proposed approach.

Discussion

ログインするとコメントできます