🦏
【ML Paper】YOLOv2: part11

2024/11/12に公開
This time, I'll introduce the YOLOv2 with the paper by Joseph Redmon and Ali Farhadi. Let's focus and see the difference from yolov1.
This article is part 11. Part 10 is here.
Original Paper: https://arxiv.org/abs/1612.08242

 Hierarchical ClassificationThe study leverages the hierarchical structure of WordNet to enhance classification within ImageNet.

Unlike traditional flat classification approaches, the authors construct a hierarchical tree, termed WordTree, from the visual nouns in ImageNet by tracing their paths through the WordNet graph to the root node, "physical object." Given that WordNet is a directed graph with multiple paths for some synsets, the tree construction prioritizes shorter paths to minimize complexity. During classification, WordTree predicts conditional probabilities at each node, such as P(\text{Norfolk terrier}|\text{terrier}), and computes absolute probabilities by multiplying these conditional probabilities along the path from the target node to the root. The assumption P(\text{physical object}) = 1 simplifies the probability computation by affirming the presence of an object in the image.
For validation, the Darknet-19 model is trained on a modified version of WordTree, referred to as WordTree1k, which includes 1,369 labels by incorporating intermediate nodes from the original 1,000 ImageNet classes. During training, ground truth labels are propagated up the tree, ensuring that an image labeled as a "Norfolk terrier" is also associated with broader categories like "dog" and "mammal." The model predicts a vector of 1,369 values, and a softmax function is applied over synsets that are hyponyms of the same concept to determine the conditional probabilities.

 ResultsThe hierarchical Darknet-19 model, trained using the WordTree1k structure, achieved a top-1 accuracy of 71.9% and a top-5 accuracy of 90.4%. This performance was attained despite expanding the label space by 369 additional concepts and maintaining the hierarchical tree structure. The marginal drop in accuracy indicates the robustness of the hierarchical approach compared to flat classification methods.

 DiscussionImplementing a hierarchical classification system offers several advantages. The model exhibits graceful degradation when encountering new or unknown object categories. For instance, if the network identifies an image as a "dog" but is uncertain about the specific breed, it maintains high confidence in the general category "dog" while distributing lower confidence levels among its hyponyms. This capability ensures that the model remains reliable even when faced with ambiguous or unfamiliar inputs.
Furthermore, the hierarchical approach extends to object detection tasks. By integrating with YOLOv2’s objectness predictor, the system can determine P(\text{physical object}) and predict bounding boxes along with the hierarchical tree of probabilities. The detection process involves traversing the tree by selecting the highest confidence path at each split until a predefined threshold is met, thereby accurately predicting the object class.

 InferenceThe hierarchical classification framework presented effectively utilizes the structured relationships within WordNet to enhance image classification and detection. By systematically predicting and combining conditional probabilities along a hierarchical tree, the model achieves high accuracy while accommodating an expanded and nuanced label set. This approach not only maintains performance levels comparable to flat classification models but also provides enhanced interpretability and adaptability in diverse classification scenarios.