🦏

【ML Paper】YOLOv2: part2

2024/11/02に公開

This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.
This article is part 2. Part 1 is here.
Original Paper: https://arxiv.org/abs/1612.08242

 2. Introduction
 2.1 IntroductionGeneral-purpose object detection aims to achieve high speed and accuracy while recognizing a diverse range of objects. Although neural network advancements have enhanced detection frameworks, most existing methods remain limited to recognizing a relatively small set of objects.
This limitation is primarily due to the constrained size of current object detection datasets compared to those available for classification and tagging tasks.

 2.2 Dataset LimitationsObject detection datasets typically consist of thousands to hundreds of thousands of images with dozens to hundreds of object categories.

In contrast, classification datasets encompass millions of images with tens or hundreds of thousands of categories.
The scalability of detection systems is hindered by the high cost of labeling images for detection, making it unlikely that detection datasets will reach the scale of classification datasets in the near future.

 2.3 Proposed MethodologyTo overcome the limitations of existing detection datasets, a novel method is proposed that leverages the extensive classification data available. This approach utilizes a hierarchical view of object classification, enabling the combination of distinct datasets.

Additionally, a joint training algorithm is introduced to train object detectors using both detection and classification data.
This method allows the model to precisely localize objects using labeled detection images while expanding its vocabulary and enhancing robustness through classification images.

 2.4 Implementation: YOLO9000Applying the proposed method, YOLO9000, a real-time object detector, was developed to recognize over 9000 different object categories.
The implementation began with enhancing the base YOLO detection system to create YOLOv2, achieving state-of-the-art real-time detection performance. Subsequently, the dataset combination technique and joint training algorithm were employed to train the model on more than 9000 classes from ImageNet alongside detection data from COCO.

2. Introduction

2.1 Introduction

2.2 Dataset Limitations

2.3 Proposed Methodology

2.4 Implementation: YOLO9000

Discussion