🦌

【CV】What is the object detection model? Part1

2024/10/11に公開

Object detection explanation part1.

 1. Object detectionObject detection is the task of detecting "what is where".

 StructureMany object detection models have a backbone, neck, and head structure.

Backbone (resnet, efficientnet, etc.): Feature extraction

Neck (FPN, Bi-FPN, etc.): Feature transformation

Head: Detects position and orientation based on the neck output

 BackboneThe foundation of the model that extracts features, a pre-trained model is used.
Famous models:

・ResNet (2015):

A model called AlexNet was the first to use DNN to achieve a high score in a global image competition. There are 8 layers, and if the number is increased too much, gradient vanishing will occur.

ResNet succeeded in deepening the layers by using Residual Connection.

・EfficientNet (2019):

By optimizing the depth and width, it is possible to achieve good results with fewer parameters.

・VIT:

Converts the image into small patches, recognizes them as one element, and uses Transformer for the image.

・DeiT:

A large number of images were required to exceed the performance of CNN with Vit, but this was solved by using knowledge distillation.
・SwinTransformer:

Introduced patch merging and shifted windows to support various resolutions and reduce the amount of calculations.
・MaxVit

Improved by combining local and global attention to extract better features.
・ConvNeXt:

Improved performance by reviewing the design of conventional CNN and incorporating the advantages of Transformer.

 BottleNeck・FPN(2017):

Improved multi-scale performance (ability to detect objects of various sizes) by skip connection and Encoder-Decoder structure.
・EfficientNet(2020)

Uses EfficientNet as backbone, effectively mixes feature maps of multiple scales with BiFPN.
・DETR(2020)

The first model to use Transformer as a neck.
・DINO

A model that solves the problems of DETR (difficulty in scaling performance, slow convergence). Improved and combined Deformable DETR and DN-DETR methods.

1. Object detection

Structure

Backbone

BottleNeck

Discussion