🐘

【CV】What is the object detection model? Part2

2024/10/12に公開

Object detection explanation part2.

2. Object detection tasks and approaches

BBox detection & classification

There are mainly two-stage and one-stage detection methods for BBox detection & classification.
Two-stage (R-CNN, Fast R-CNN, Faster R-CNN, etc.): Detect candidate object regions and then infer the class within the region
One-stage (anchor-based) (SSD, RetinaNet): Estimate object position and class with the aid of anchor boxes using yellow grids
One-stage (anchor-free) (YOLO): Estimate object position and class without using anchor boxes

Segmentation

Semantic segmentation

The task of classifying objects pixel by pixel
If the class is the same, the same label is used even if the objects are different.

Instance segmentation

Object detection + segmentation
Predict position using bbox + predict object area

Panoptic segmentation

Semantic + instance segmentation
Assign object ID and class to every pixel

Open-Vocabulary Object Detection(2021)

Detect unknown object classes specified in text. Not limited to predefined object classes.

Pose estimation

Extract human joints and facial parts from images. Can be used for sports commentary and human recognition in VR

3D object detection

Detect objects from 3D data instead of images

Object tracking

Track objects in videos online or offline.
Online: Future frames cannot be used
Offline: Future frames can be used

Discussion