📌

YOLONASをgoogle colabで試してみた。

2023/05/15に公開

yolo

YOLONASとは

YOLONASとはDeciAIが新しく出したYolov8を超えるSOTAのObject detectionのモデルです。

上のgithubを読むと軽量化・高速化・高精度化に成功している感じですね。

リンク

Colab
github

準備

Google Colabを開き、メニューから「ランタイム→ランタイムのタイプを変更」でランタイムを「GPU」に変更します。

環境構築

インストール手順です。

!pip install -q super-gradients==3.1.1
!pip install -q roboflow
!pip install -q supervision

推論

(1)モデルのロード

import torch

DEVICE = 'cuda' if torch.cuda.is_available() else "cpu"
MODEL_ARCH = 'yolo_nas_l'
from super_gradients.training import models

model = models.get(MODEL_ARCH, pretrained_weights="coco").to(DEVICE)

(2)テストデータの取得
サッカーの画像をダウンロードしてきます。

%cd /content

!wget https://i.cbc.ca/1.6585809.1663352819!/cpImage/httpImage/image.jpg_gen/derivatives/16x9_780/panama-canada-wcup-soccer.jpg -O soccer.jpg

(3)推論実行
推論の実行コードは以下の通りです。

import cv2

SOURCE_IMAGE_PATH = "/content/soccer.jpg"
image = cv2.imread(SOURCE_IMAGE_PATH)
result = list(model.predict(image, conf=0.35))[0]

(4)推論結果の可視化

import supervision as sv

detections = sv.Detections(
    xyxy=result.prediction.bboxes_xyxy,
    confidence=result.prediction.confidence,
    class_id=result.prediction.labels.astype(int)
)

box_annotator = sv.BoxAnnotator()

labels = [
    f"{result.class_names[class_id]} {confidence:0.2f}"
    for _, _, confidence, class_id, _
    in detections
]

annotated_frame = box_annotator.annotate(
    scene=image.copy(),
    detections=detections,
    labels=labels
)

%matplotlib inline
sv.plot_image(annotated_frame, (12, 12))

推論結果は以下の通りです。

Advanced Application

custom datasetでfinetuningをやってみましょう。
(1) Roboflowの利用
Roboflowというサービスを利用します。
https://roboflow.com/

上記のサービスに行ってアカウントを作成し、サインアップしてみてください。

以下のデータセットに行ってYolov5 pytorchの形式でzipでダウンロードしてください。

(2) upload google drive
先ほど(1)でダウンロードしたzipを自分のgoogle driveにアップロードして以下のコードを実行してください。

from google.colab import drive
drive.mount('/content/drive')
%cd /content
!mkdir /content/Football-Players-Detection
%cd /content/Football-Players-Detection
!unzip /content/drive/MyDrive/football-players-detection.v1i.yolov5pytorch.zip

(3) finetuningの実行
finetuningを実行する前にData Repositoryのdata.yamlを確認してみましょう。

data.yaml

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 2
names: ['ball', 'player']

roboflow:
  workspace: roboflow-jvuqo
  project: football-players-detection-bzlaf
  version: 1
  license: CC BY 4.0
  url: https://universe.roboflow.com/roboflow-jvuqo/football-players-detection-bzlaf/dataset/1

classesとどこにフォルダがあるかが明記されています。ここのデータを参考に学習のためのパラメータを作成していきます。
今回はballとplayerの２種類です。

(3-1)入力変数

%cd /content
MODEL_ARCH = 'yolo_nas_l'
BATCH_SIZE = 8
MAX_EPOCHS = 25
CHECKPOINT_DIR = f'/content/checkpoints'
EXPERIMENT_NAME = "Football-Players-Detection"
LOCATION = "/content/Football-Players-Detection"
CLASSES = ['ball', 'player']

(3-2)trainerインスタンスの作成

from super_gradients.training import Trainer

trainer = Trainer(experiment_name=EXPERIMENT_NAME, ckpt_root_dir=CHECKPOINT_DIR)

(3-3) Dataset params

dataset_params = {
    'data_dir': LOCATION,
    'train_images_dir':'train/images',
    'train_labels_dir':'train/labels',
    'val_images_dir':'valid/images',
    'val_labels_dir':'valid/labels',
    'test_images_dir':'test/images',
    'test_labels_dir':'test/labels',
    'classes': CLASSES
}

(3-4) DataLoader作成

from super_gradients.training.dataloaders.dataloaders import (
    coco_detection_yolo_format_train, coco_detection_yolo_format_val)

train_data = coco_detection_yolo_format_train(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['train_images_dir'],
        'labels_dir': dataset_params['train_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size': BATCH_SIZE,
        'num_workers': 2
    }
)

val_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['val_images_dir'],
        'labels_dir': dataset_params['val_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size': BATCH_SIZE,
        'num_workers': 2
    }
)

test_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['test_images_dir'],
        'labels_dir': dataset_params['test_labels_dir'],
        'classes': dataset_params['classes']
    },
    dataloader_params={
        'batch_size': BATCH_SIZE,
        'num_workers': 2
    }
)

(3-5) modelの準備

from super_gradients.training import models

model = models.get(
    MODEL_ARCH, 
    num_classes=len(dataset_params['classes']), 
    pretrained_weights="coco"
)

(3-6) Training Parametersの設定

from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback

train_params = {
    'silent_mode': False,
    "average_best_models":True,
    "warmup_mode": "linear_epoch_step",
    "warmup_initial_lr": 1e-6,
    "lr_warmup_epochs": 3,
    "initial_lr": 5e-4,
    "lr_mode": "cosine",
    "cosine_final_lr_ratio": 0.1,
    "optimizer": "Adam",
    "optimizer_params": {"weight_decay": 0.0001},
    "zero_weight_decay_on_bias_and_bn": True,
    "ema": True,
    "ema_params": {"decay": 0.9, "decay_type": "threshold"},
    "max_epochs": MAX_EPOCHS,
    "mixed_precision": True,
    "loss": PPYoloELoss(
        use_static_assigner=False,
        num_classes=len(dataset_params['classes']),
        reg_max=16
    ),
    "valid_metrics_list": [
        DetectionMetrics_050(
            score_thres=0.1,
            top_k_predictions=300,
            num_cls=len(dataset_params['classes']),
            normalize_targets=True,
            post_prediction_callback=PPYoloEPostPredictionCallback(
                score_threshold=0.01,
                nms_top_k=1000,
                max_predictions=300,
                nms_threshold=0.7
            )
        )
    ],
    "metric_to_watch": 'mAP@0.50'
}

(3-7)finetuningの実行

trainer.train(
    model=model, 
    training_params=train_params, 
    train_loader=train_data, 
    valid_loader=val_data
)

実行の様子がyolov8のやつよりテンション上がる感じでかっこいい

finetuning後(EPOCH25)の結果です。

SUMMARY OF EPOCH 25
├── Training
│   ├── Ppyoloeloss/loss = 1.2786
│   │   ├── Best until now = 1.2937 (↘ -0.015)
│   │   └── Epoch N-1      = 1.2937 (↘ -0.015)
│   ├── Ppyoloeloss/loss_cls = 0.5955
│   │   ├── Best until now = 0.6029 (↘ -0.0074)
│   │   └── Epoch N-1      = 0.6029 (↘ -0.0074)
│   ├── Ppyoloeloss/loss_dfl = 0.5681
│   │   ├── Best until now = 0.5695 (↘ -0.0014)
│   │   └── Epoch N-1      = 0.5712 (↘ -0.0031)
│   └── Ppyoloeloss/loss_iou = 0.1596
│       ├── Best until now = 0.1621 (↘ -0.0024)
│       └── Epoch N-1      = 0.1621 (↘ -0.0024)
└── Validation
    ├── F1@0.50 = 0.2491
    │   ├── Best until now = 0.283  (↘ -0.0339)
    │   └── Epoch N-1      = 0.283  (↘ -0.0339)
    ├── Map@0.50 = 0.6016
    │   ├── Best until now = 0.639  (↘ -0.0374)
    │   └── Epoch N-1      = 0.635  (↘ -0.0335)
    ├── Ppyoloeloss/loss = 1.2068
    │   ├── Best until now = 1.1884 (↗ 0.0184)
    │   └── Epoch N-1      = 1.1884 (↗ 0.0184)
    ├── Ppyoloeloss/loss_cls = 0.5928
    │   ├── Best until now = 0.5828 (↗ 0.01)
    │   └── Epoch N-1      = 0.5828 (↗ 0.01)
    ├── Ppyoloeloss/loss_dfl = 0.5478
    │   ├── Best until now = 0.5468 (↗ 0.001)
    │   └── Epoch N-1      = 0.5468 (↗ 0.001)
    ├── Ppyoloeloss/loss_iou = 0.136
    │   ├── Best until now = 0.1329 (↗ 0.0032)
    │   └── Epoch N-1      = 0.1329 (↗ 0.0032)
    ├── Precision@0.50 = 0.1514
    │   ├── Best until now = 0.1753 (↘ -0.0239)
    │   └── Epoch N-1      = 0.1753 (↘ -0.0239)
    └── Recall@0.50 = 0.7139
        ├── Best until now = 0.7576 (↘ -0.0437)
        └── Epoch N-1      = 0.7572 (↘ -0.0432)

Tensorboardで学習結果を可視化するコードもいかに載せておきます。

%load_ext tensorboard
%tensorboard --logdir {CHECKPOINT_DIR}/{EXPERIMENT_NAME}

(3-8) Inference Trained model
学習後のモデルをロードします。

best_model = models.get(
    MODEL_ARCH,
    num_classes=len(dataset_params['classes']),
    checkpoint_path=f"{CHECKPOINT_DIR}/{EXPERIMENT_NAME}/average_model.pth"
).to(DEVICE)

学習モデルの評価をしてみましょう。

trainer.test(
    model=best_model,
    test_loader=test_data,
    test_metrics_list=DetectionMetrics_050(
        score_thres=0.1, 
        top_k_predictions=300, 
        num_cls=len(dataset_params['classes']), 
        normalize_targets=True, 
        post_prediction_callback=PPYoloEPostPredictionCallback(
            score_threshold=0.01, 
            nms_top_k=1000, 
            max_predictions=300,                                                                              
            nms_threshold=0.7
        )
    )
)

評価結果

{'PPYoloELoss/loss_cls': 0.59212464,
 'PPYoloELoss/loss_iou': 0.13618927,
 'PPYoloELoss/loss_dfl': 0.54733866,
 'PPYoloELoss/loss': 1.2062671,
 'Precision@0.50': tensor(0.1690),
 'Recall@0.50': tensor(0.7122),
 'mAP@0.50': tensor(0.6652),
 'F1@0.50': tensor(0.2715)}

最後にfinetuningモデルの推論をしてみます。

import supervision as sv

root_dir = "/content/Football-Players-Detection"

ds = sv.Dataset.from_yolo(
    images_directory_path=f"{root_dir}/test/images",
    annotations_directory_path=f"{root_dir}/test/labels",
    data_yaml_path=f"{root_dir}/data.yaml",
    force_masks=False
)

import supervision as sv

CONFIDENCE_TRESHOLD = 0.5

predictions = {}

for image_name, image in ds.images.items():
    result = list(best_model.predict(image, conf=CONFIDENCE_TRESHOLD))[0]
    detections = sv.Detections(
        xyxy=result.prediction.bboxes_xyxy,
        confidence=result.prediction.confidence,
        class_id=result.prediction.labels.astype(int)
    )
    predictions[image_name] = detections

推論実行結果の可視化のコードです。

import supervision as sv

MAX_IMAGE_COUNT = 5

n = min(MAX_IMAGE_COUNT, len(ds.images))

keys = list(ds.images.keys())
keys = random.sample(keys, n)

box_annotator = sv.BoxAnnotator()

images = []
titles = []

for key in keys:
    frame_with_annotations = box_annotator.annotate(
        scene=ds.images[key].copy(),
        detections=ds.annotations[key],
        skip_label=True
    )
    images.append(frame_with_annotations)
    titles.append('annotations')
    frame_with_predictions = box_annotator.annotate(
        scene=ds.images[key].copy(),
        detections=predictions[key],
        skip_label=True
    )
    images.append(frame_with_predictions)
    titles.append('predictions')

%matplotlib inline
sv.plot_images_grid(images=images, titles=titles, grid_size=(n, 2), size=(2 * 4, n * 4))

finetuning modelの推論結果

最後に

今回はObject detectionのnew SOTAのYOLONASについて推論とCustom datasetでのfinetuningをgoogle colabで試してみました。Yolov8との違いは正直ちょっとわからなかったですが割と簡単に試せる感じです。Yolov8とYoloNASをそれぞれ試してみてお気に入りの方をチョイスすることもいいかもしれません。

今後ともLLM, Diffusion model, Image Analysis, 3Dに関連する試した記事を投稿していく予定なのでよろしくお願いします。

YOLONASとは

リンク

準備

環境構築

推論

Advanced Application

最後に

Discussion