🦩

【Method for ML】How to use the MMDetection

2024/09/06に公開

1. What is the MMDetection

MMDetection is an open-source object detection toolbox based on PyTorch. It provides a wide range of tools and models for tasks like object detection and instance segmentation. The framework is highly modular, allowing for flexible customization of components like backbones (feature extractors), necks (feature pyramids), and heads (classification and regression layers).

https://mmdetection.readthedocs.io/en/latest/

It is used to identify objects in images or videos, including bounding box prediction and to detect objects and segment them using both bounding boxes and masks.

Models can be checked here like this.

2. How to use

2.1 Install

Install it according to your CUDA version:

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.10/index.html

Clone and Install MMDetection:

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .

2.2 Load Configuration and Model

You can use pre-trained models to perform inference on images.

from mmdet.apis import init_detector, inference_detector
import mmcv

# Load the config file and checkpoint
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'

# Initialize the detector
model = init_detector(config_file, checkpoint_file, device='cuda:0')

2.3 Perform Inference

# Inference on an image
img = 'test.jpg'  # the path to your image
result = inference_detector(model, img)

# Visualize the result
model.show_result(img, result, out_file='result.jpg')

3. Custom model

Explain how to create a custom model from here.

3.1 Configuration

MMDetection provides configuration files for different models. You'll need to customize them (such as dataset paths, number of classes, batch size, etc.).
Ensure your dataset is in COCO format or another supported format.

・Example

# Modify dataset paths
dataset_type = 'CocoDataset'
data_root = 'data/coco/'

# Modify the number of classes in the dataset
model = dict(
    roi_head=dict(
        bbox_head=dict(num_classes=your_num_classes)))

3.2 Train/Infer

・Training

python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

If you want to resume training from a checkpoint:

python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py --resume-from <path_to_checkpoint>

・Evaluation
After training, you can evaluate the model's performance using:

python tools/test.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py <path_to_checkpoint> --eval bbox

4. Advanced Features

MMDetection is highly customizable, allowing you to tweak network architectures, optimizers, learning rates, etc.
You can also switch between various detection models like Faster R-CNN, YOLO, RetinaNet, etc., by changing the configuration files.

please note that MMDetection uses .py configuration files that define model parameters, dataset paths, and training details. Always check these files before training.

5. Configuration file example

the configuration file is .py

# Full model configuration file example
model = dict(
    type='FasterRCNN',
    backbone=dict(
        type='ResNet',
        depth=101,  # Changing ResNet to ResNet-101
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',  # Using Feature Pyramid Network (FPN) as the neck
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    roi_head=dict(
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=10,  # Number of classes for your custom dataset
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],
                target_stds=[0.1, 0.1, 0.2, 0.2]
            ),
            loss_cls=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)
        )
    )
)

# Dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/custom_coco/'
classes = ('class_1', 'class_2', 'class_3', ..., 'class_10')  # List of class names

data = dict(
    samples_per_gpu=4,  # Batch size per GPU
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        classes=classes),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        classes=classes),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_test2017.json',
        img_prefix=data_root + 'test2017/',
        classes=classes)
)

# Data augmentation pipeline for training
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RandomFlip', flip_ratio=0.5),  # Randomly flip the image
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='Normalize',  # Apply image normalization
         mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='MultiScaleFlipAug',
         img_scale=(1333, 800),
         flip=False,
         transforms=[
             dict(type='Resize', keep_ratio=True),
             dict(type='RandomFlip'),
             dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True),
             dict(type='Pad', size_divisor=32),
             dict(type='ImageToTensor', keys=['img']),
             dict(type='Collect', keys=['img']),
         ])
]

# Optimizer settings
optimizer = dict(
    type='AdamW',  # Using AdamW optimizer
    lr=0.0001,  # Learning rate
    weight_decay=0.01
)

optimizer_config = dict(
    grad_clip=dict(max_norm=35, norm_type=2)  # Gradient clipping to stabilize training
)

# Learning rate schedule (cosine annealing)
lr_config = dict(
    policy='CosineAnnealing',
    min_lr=1e-6
)

# Runner type and total epochs
runner = dict(type='EpochBasedRunner', max_epochs=24)

# Evaluation settings
evaluation = dict(interval=1, metric='bbox')  # Evaluate every epoch using mAP on bounding boxes

# Logging and checkpointing
checkpoint_config = dict(interval=1)  # Save checkpoint every epoch
log_config = dict(
    interval=50,  # Log every 50 iterations
    hooks=[
        dict(type='TextLoggerHook'),
        # Optional: Uncomment to use TensorBoard for logging
        # dict(type='TensorboardLoggerHook')
    ]
)

# Pretrained model weights for transfer learning
load_from = 'checkpoints/faster_rcnn_r101_fpn_1x_coco_20200130-047c8118.pth'

5. Summary

This time, I explained the usage of MMDetection.
Thank you for reading.

Reference

[1] MMDetection

Discussion