【Method for ML】How to use the MMDetection
1. What is the MMDetection
MMDetection is an open-source object detection toolbox based on PyTorch. It provides a wide range of tools and models for tasks like object detection and instance segmentation. The framework is highly modular, allowing for flexible customization of components like backbones (feature extractors), necks (feature pyramids), and heads (classification and regression layers).
It is used to identify objects in images or videos, including bounding box prediction and to detect objects and segment them using both bounding boxes and masks.
Models can be checked here like this.
2. How to use
2.1 Install
Install it according to your CUDA version:
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.10/index.html
Clone and Install MMDetection:
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .
2.2 Load Configuration and Model
You can use pre-trained models to perform inference on images.
from mmdet.apis import init_detector, inference_detector
import mmcv
# Load the config file and checkpoint
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
# Initialize the detector
model = init_detector(config_file, checkpoint_file, device='cuda:0')
2.3 Perform Inference
# Inference on an image
img = 'test.jpg' # the path to your image
result = inference_detector(model, img)
# Visualize the result
model.show_result(img, result, out_file='result.jpg')
3. Custom model
Explain how to create a custom model from here.
3.1 Configuration
MMDetection provides configuration files for different models. You'll need to customize them (such as dataset paths, number of classes, batch size, etc.).
Ensure your dataset is in COCO format or another supported format.
・Example
# Modify dataset paths
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
# Modify the number of classes in the dataset
model = dict(
roi_head=dict(
bbox_head=dict(num_classes=your_num_classes)))
3.2 Train/Infer
・Training
python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
If you want to resume training from a checkpoint:
python tools/train.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py --resume-from <path_to_checkpoint>
・Evaluation
After training, you can evaluate the model's performance using:
python tools/test.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py <path_to_checkpoint> --eval bbox
4. Advanced Features
MMDetection is highly customizable, allowing you to tweak network architectures, optimizers, learning rates, etc.
You can also switch between various detection models like Faster R-CNN, YOLO, RetinaNet, etc., by changing the configuration files.
please note that MMDetection uses .py configuration files that define model parameters, dataset paths, and training details. Always check these files before training.
5. Configuration file example
the configuration file is .py
# Full model configuration file example
model = dict(
type='FasterRCNN',
backbone=dict(
type='ResNet',
depth=101, # Changing ResNet to ResNet-101
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'),
neck=dict(
type='FPN', # Using Feature Pyramid Network (FPN) as the neck
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
roi_head=dict(
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=10, # Number of classes for your custom dataset
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]
),
loss_cls=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)
)
)
)
# Dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/custom_coco/'
classes = ('class_1', 'class_2', 'class_3', ..., 'class_10') # List of class names
data = dict(
samples_per_gpu=4, # Batch size per GPU
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
classes=classes),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
classes=classes),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_test2017.json',
img_prefix=data_root + 'test2017/',
classes=classes)
)
# Data augmentation pipeline for training
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RandomFlip', flip_ratio=0.5), # Randomly flip the image
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='Normalize', # Apply image normalization
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
# Optimizer settings
optimizer = dict(
type='AdamW', # Using AdamW optimizer
lr=0.0001, # Learning rate
weight_decay=0.01
)
optimizer_config = dict(
grad_clip=dict(max_norm=35, norm_type=2) # Gradient clipping to stabilize training
)
# Learning rate schedule (cosine annealing)
lr_config = dict(
policy='CosineAnnealing',
min_lr=1e-6
)
# Runner type and total epochs
runner = dict(type='EpochBasedRunner', max_epochs=24)
# Evaluation settings
evaluation = dict(interval=1, metric='bbox') # Evaluate every epoch using mAP on bounding boxes
# Logging and checkpointing
checkpoint_config = dict(interval=1) # Save checkpoint every epoch
log_config = dict(
interval=50, # Log every 50 iterations
hooks=[
dict(type='TextLoggerHook'),
# Optional: Uncomment to use TensorBoard for logging
# dict(type='TensorboardLoggerHook')
]
)
# Pretrained model weights for transfer learning
load_from = 'checkpoints/faster_rcnn_r101_fpn_1x_coco_20200130-047c8118.pth'
5. Summary
This time, I explained the usage of MMDetection.
Thank you for reading.
Reference
[1] MMDetection
Discussion