Closed31

COCO-Hand データセットのアノテーションデータをYOLOフォーマットへ変換 (Gold-YOLO)

PINTOPINTO
  • COCO-Hand
# フォーマット
[image_name, xmin, xmax, ymin, ymax, x1, y1, x2, y2, x3, y3, x4, y4]

# 実際の値
000000001098.jpg,134,190,160,188,138,160,134,178,186,188,190,169,hand
000000001098.jpg,234,288,187,242,265,187,234,200,257,242,288,230,hand
000000000036.jpg,215,257,208,257,230,257,257,254,242,208,215,210,hand
  1. 各行から情報を抽出します。
  2. 画像の幅と高さを取得します。これは、YOLO format でバウンディングボックスを正規化するために必要です。
  3. YOLO format に合わせて、情報を変換します。具体的には、中心点の座標 (cx, cy) とバウンディングボックスの幅および高さ (bw, bh) を計算し、これらの値を画像の幅および高さで正規化します。
  4. 出力ファイルに書き込みます。
import os
from PIL import Image

def convert_to_yolo_format(annotation_file, output_folder, image_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    with open(annotation_file, 'r') as f:
        lines = f.readlines()

    for line in lines:
        line = line.strip().split(',')
        image_name = line[0]
        xmin, xmax, ymin, ymax = map(int, line[1:5])
        # クラス名は 'hand' と仮定
        class_id = 0 

        # 画像のサイズを取得
        with Image.open(os.path.join(image_folder, image_name)) as img:
            width, height = img.size

        # YOLO format: x_center y_center width height
        x_center = (xmin + xmax) / 2.0
        y_center = (ymin + ymax) / 2.0
        b_width = xmax - xmin
        b_height = ymax - ymin

        # 正規化
        x_center /= width
        y_center /= height
        b_width /= width
        b_height /= height

        # 出力ファイルへの書き込み
        with open(os.path.join(output_folder, image_name.replace('.jpg', '.txt')), 'a') as out_file:
            out_file.write(f"{class_id} {x_center} {y_center} {b_width} {b_height}\n")

# 使用例
annotation_file = 'COCO-Hand-S_annotations.txt'
output_folder = 'yolo_annotations'
image_folder = 'path_to_images'  # 画像のパスを指定
convert_to_yolo_format(annotation_file, output_folder, image_folder)
PINTOPINTO
  • GOLD-YOLO
    データローダ部
    @staticmethod
    def get_data_loader(args, cfg, data_dict):
        train_path, val_path = data_dict['train'], data_dict['val']
        # check data
        nc = int(data_dict['nc'])
        class_names = data_dict['names']
        assert len(class_names) == nc, f'the length of class names does not match the number of classes defined'
        grid_size = max(int(max(cfg.model.head.strides)), 32)
        # create train dataloader
        train_loader = create_dataloader(train_path, args.img_size, args.batch_size // args.world_size, grid_size,
                                         hyp=dict(cfg.data_aug), augment=True, rect=False, rank=args.local_rank,
                                         workers=args.workers, shuffle=True, check_images=args.check_images,
                                         check_labels=args.check_labels, data_dict=data_dict, task='train')[0]
        # create val dataloader
        val_loader = None
        if args.rank in [-1, 0]:
            val_loader = create_dataloader(val_path, args.img_size, args.batch_size // args.world_size * 2, grid_size,
                                           hyp=dict(cfg.data_aug), rect=True, rank=-1, pad=0.5,
                                           workers=args.workers, check_images=args.check_images,
                                           check_labels=args.check_labels, data_dict=data_dict, task='val')[0]
        
        return train_loader, val_loader
PINTOPINTO

https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO

  • トレーニング用YAML - YOLOv6仕様
  • annotation JSONはトレーニング開始時に勝手に生成される
  • JSONの生成先パスは annotations フォルダの直下
dataset.yaml
# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR
train: ./coco/COCO-Hand-S/images/train # train images
val: ./coco/COCO-Hand-S/images/val # val images
test: ./coco/COCO-Hand-S/images/test # test images (optional)

# whether it is coco dataset, only coco dataset should be set to True.
is_coco: False
# Classes
nc: 1  # number of classes
names: ['hand']  # class names
PINTOPINTO
pip install \
mmcv==1.6.1 \
addict \
pycocotools \
tensorboard \
opencv-python-headless \
gdown

pip3 install \
torch \
torchvision \
torchaudio \
--index-url https://download.pytorch.org/whl/cu118
  • N size
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.417
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.709
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.633
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.540
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.492
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.858
Results saved to runs/train/gold_yolo-n1
Epoch: 361 | mAP@0.5: 0.7092690168491806 | mAP@0.50:0.95: 0.41735040829118536
  • S size
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.429
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.718
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.432
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.176
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.475
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.851
Results saved to runs/train/gold_yolo-s1
Epoch: 201 | mAP@0.5: 0.7175394644960713 | mAP@0.50:0.95: 0.4286192100279552
  • M size
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.477
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.767
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.499
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.415
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.699
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.867
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.187
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.515
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.583
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.769
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.888
Results saved to runs/train/gold_yolo-m1
Epoch: 347 | mAP@0.5: 0.7670340058835509 | mAP@0.50:0.95: 0.47673873875288386
  • L size
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.490
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.785
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.523
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.428
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.711
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.902
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.192
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.781
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.923
Results saved to runs/train/gold_yolo-l1
Epoch: 271 | mAP@0.5: 0.7852822540107776 | mAP@0.50:0.95: 0.4895539540328813
PINTOPINTO
python tools/infer.py \
--weights gold_yolo_n_hand_0.2295.pt
PINTOPINTO

CVAT で生成したアノテーションと画像をYOLO 1.0 フォーマットでダウンロードしたあとの処理

  • アノテーションされていない画像を削除
01_del_empty_yolo_data.py
import os

def delete_empty_txt_and_png_pairs(directory_path):
    # 指定されたディレクトリ内のすべてのファイルを取得
    all_files = os.listdir(directory_path)

    # .txt ファイルのみをフィルタリング
    txt_files = sorted([f for f in all_files if f.endswith('.txt')])

    for txt_file in txt_files:
        # .txt ファイルのフルパスを取得
        txt_full_path = os.path.join(directory_path, txt_file)

        # .txt ファイルの中身を確認
        with open(txt_full_path, 'r') as f:
            content = f.read().strip()  # 空白や改行を除去

        # .txt ファイルが空の場合、対応する .PNG と .txt ファイルを削除
        if not content:
            png_file = txt_file.replace('.txt', '.PNG')
            png_full_path = os.path.join(directory_path, png_file)
            if os.path.exists(png_full_path):
                os.remove(png_full_path)
                print(f"Deleted: {png_full_path}")

            jpg_file = txt_file.replace('.txt', '.jpg')
            jpg_full_path = os.path.join(directory_path, jpg_file)
            if os.path.exists(jpg_full_path):
                os.remove(jpg_full_path)
                print(f"Deleted: {jpg_full_path}")

            os.remove(txt_full_path)
            print(f"Deleted: {txt_full_path}")

# 実行例
dir_path = './hand_yolo/obj_train_data'  # ここに適切なディレクトリパスを指定してください
delete_empty_txt_and_png_pairs(dir_path)
  • 残った画像だけでアノテーション結果を画面表示して確認
02_view_yolo_data.py
import os
import cv2

def draw_bounding_boxes_from_yolo(directory_path):
    all_files = os.listdir(directory_path)
    png_files = sorted([f for f in all_files if f.endswith('.PNG') or f.endswith('.jpg')])

    for png_file in png_files:
        img_path = os.path.join(directory_path, png_file)
        txt_path = os.path.join(directory_path, png_file.replace('.PNG', '.txt').replace('.jpg', '.txt'))

        # 画像を読み込み
        image = cv2.imread(img_path)

        if not os.path.exists(txt_path):
            print(f"No .txt file found for {png_file}")
            continue

        # YOLOフォーマットの.txtファイルからバウンディングボックスを取得
        with open(txt_path, 'r') as f:
            lines = f.readlines()
            for line in lines:
                data = line.strip().split()
                if len(data) == 5:
                    # YOLOフォーマットの値を取得
                    _, x_center, y_center, width, height = map(float, data)

                    # YOLOフォーマットからOpenCVの座標形式に変換
                    x_center, y_center, width, height = x_center * image.shape[1], y_center * image.shape[0], width * image.shape[1], height * image.shape[0]
                    x1, y1, x2, y2 = int(x_center - width / 2), int(y_center - height / 2), int(x_center + width / 2), int(y_center + height / 2)

                    # バウンディングボックスを描画
                    cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # 画像を表示
        cv2.imshow(f"Image with Bounding Boxes - {png_file}", image)
        key = cv2.waitKey(0)
        if key == 27: # ESC
            break
        cv2.destroyAllWindows()

# 実行
dir_path = './hand_yolo/obj_train_data'  # ここに適切なディレクトリパスを指定してください
draw_bounding_boxes_from_yolo(dir_path)
  • YOLOでトレーニングするために train, val, test にセパレート
03_prepare_data.py
import argparse
import os
import shutil
from glob import glob
from sklearn.model_selection import train_test_split
import argparse


def move_files_to_folder(list_of_files, destination_folder):
    for f in list_of_files:
        try:
            shutil.copy(f, destination_folder)
        except:
            print(f)
            assert False


def get_annotations(path):
    annotations = []
    images = []
    for txt_file in glob(path + '/*.txt'):
        annotations.append(txt_file)
        image = txt_file.replace('txt', 'PNG')
        if os.path.exists(image):
            images.append(image)
        else:
            image = txt_file.replace('txt', 'jpg')
            images.append(image)

    return annotations, images


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--path', type=str, help='Path to images', default='hand_yolo/obj_train_data')
    opt = parser.parse_args()

    annotations, images = get_annotations(opt.path)

    # Split the dataset into train-valid-test splits
    train_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size = 0.2, random_state = 1)
    val_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size = 0.5, random_state = 1)

    os.makedirs('data/images/train', exist_ok=True)
    os.makedirs('data/images/val', exist_ok=True)
    os.makedirs('data/images/test', exist_ok=True)
    os.makedirs('data/annotations/train', exist_ok=True)
    os.makedirs('data/annotations/val', exist_ok=True)
    os.makedirs('data/annotations/test', exist_ok=True)

    move_files_to_folder(train_images, 'data/images/train')
    move_files_to_folder(val_images, 'data/images/val/')
    move_files_to_folder(test_images, 'data/images/test/')
    move_files_to_folder(train_annotations, 'data/annotations/train/')
    move_files_to_folder(val_annotations, 'data/annotations/val/')
    move_files_to_folder(test_annotations, 'data/annotations/test/')

    os.system('cp -r data/annotations data/labels')


if __name__ == "__main__":
    main()

最終的にこうなる。この階層を起点として3フォルダを data.zip という1ファイルに圧縮する。

PINTOPINTO

CVATを使用したCOCO-Handの再アノテーション作業のコスト軽減のため、上記で生成したMサイズのモデルを使用してCOCO-Handを再アノテーションするロジック。

annotation.py
#!/usr/bin/env python

import os
import copy
import cv2
import time
import numpy as np
import onnxruntime
from argparse import ArgumentParser
from typing import Tuple, Optional, List
from tqdm import tqdm


class GoldYOLOONNX(object):
    def __init__(
        self,
        model_path: Optional[str] = 'gold_yolo_l_hand_post_0231_0.4841_1x3x384x480.onnx',
        class_score_th: Optional[float] = 0.35,
        providers: Optional[List] = [
            'CUDAExecutionProvider',
            'CPUExecutionProvider',
        ],
    ):
        """GoldYOLOONNX

        Parameters
        ----------
        model_path: Optional[str]
            ONNX file path for YOLOv7

        class_score_th: Optional[float]
            Score threshold. Default: 0.25

        providers: Optional[List]
            Name of onnx execution providers
            Default:
            [
                'CUDAExecutionProvider',
                'CPUExecutionProvider',
            ]
        """
        # Threshold
        self.class_score_th = class_score_th

        # Model loading
        session_option = onnxruntime.SessionOptions()
        session_option.log_severity_level = 3
        self.onnx_session = onnxruntime.InferenceSession(
            model_path,
            sess_options=session_option,
            providers=providers,
        )
        self.providers = self.onnx_session.get_providers()

        self.input_shapes = [
            input.shape for input in self.onnx_session.get_inputs()
        ]
        self.input_names = [
            input.name for input in self.onnx_session.get_inputs()
        ]
        self.output_names = [
            output.name for output in self.onnx_session.get_outputs()
        ]


    def __call__(
        self,
        image: np.ndarray,
    ) -> Tuple[np.ndarray, np.ndarray]:
        """GoldYOLOONNX

        Parameters
        ----------
        image: np.ndarray
            Entire image

        Returns
        -------
        boxes: np.ndarray
            Predicted boxes: [N, y1, x1, y2, x2]

        scores: np.ndarray
            Predicted box scores: [N, score]
        """
        temp_image = copy.deepcopy(image)

        # PreProcess
        resized_image = self.__preprocess(
            temp_image,
        )

        # Inference
        inferece_image = np.asarray([resized_image], dtype=np.float32)
        boxes = self.onnx_session.run(
            self.output_names,
            {input_name: inferece_image for input_name in self.input_names},
        )[0]

        # PostProcess
        result_boxes, result_scores = \
            self.__postprocess(
                image=temp_image,
                boxes=boxes,
            )

        return result_boxes, result_scores


    def __preprocess(
        self,
        image: np.ndarray,
        swap: Optional[Tuple[int,int,int]] = (2, 0, 1),
    ) -> np.ndarray:
        """__preprocess

        Parameters
        ----------
        image: np.ndarray
            Entire image

        swap: tuple
            HWC to CHW: (2,0,1)
            CHW to HWC: (1,2,0)
            HWC to HWC: (0,1,2)
            CHW to CHW: (0,1,2)

        Returns
        -------
        resized_image: np.ndarray
            Resized and normalized image.
        """
        # Normalization + BGR->RGB
        resized_image = cv2.resize(
            image,
            (
                int(self.input_shapes[0][3]),
                int(self.input_shapes[0][2]),
            )
        )
        resized_image = np.divide(resized_image, 255.0)
        resized_image = resized_image[..., ::-1]
        resized_image = resized_image.transpose(swap)
        resized_image = np.ascontiguousarray(
            resized_image,
            dtype=np.float32,
        )
        return resized_image



    def __postprocess(
        self,
        image: np.ndarray,
        boxes: np.ndarray,
    ) -> Tuple[np.ndarray, np.ndarray]:
        """__postprocess

        Parameters
        ----------
        image: np.ndarray
            Entire image.

        boxes: np.ndarray
            float32[N, 7]

        Returns
        -------
        result_boxes: np.ndarray
            Predicted boxes: [N, y1, x1, y2, x2]

        result_scores: np.ndarray
            Predicted box confs: [N, score]
        """
        image_height = image.shape[0]
        image_width = image.shape[1]

        """
        Detector is
            N -> Number of boxes detected
            batchno -> always 0: BatchNo.0

        batchno_classid_y1x1y2x2_score: float32[N,7]
        """
        result_boxes = []
        result_scores = []
        if len(boxes) > 0:
            scores = boxes[:, 6:7]
            keep_idxs = scores[:, 0] > self.class_score_th
            scores_keep = scores[keep_idxs, :]
            boxes_keep = boxes[keep_idxs, :]

            if len(boxes_keep) > 0:
                for box, score in zip(boxes_keep, scores_keep):
                    x_min = int(max(box[2], 0) * image_width / self.input_shapes[0][3])
                    y_min = int(max(box[3], 0) * image_height / self.input_shapes[0][2])
                    x_max = int(min(box[4], self.input_shapes[0][3]) * image_width / self.input_shapes[0][3])
                    y_max = int(min(box[5], self.input_shapes[0][2]) * image_height / self.input_shapes[0][2])

                    result_boxes.append(
                        [x_min, y_min, x_max, y_max]
                    )
                    result_scores.append(
                        score
                    )

        return np.asarray(result_boxes), np.asarray(result_scores)


def get_sorted_image_paths(directory):
    image_exts = {".jpg", ".jpeg", ".png", ".JPG", ".PNG", ".JPEG"}
    files = [os.path.join(directory, f) for f in os.listdir(directory)]
    image_files = [f for f in files if os.path.splitext(f)[1].lower() in image_exts]
    return sorted(image_files, key=lambda x: os.path.basename(x))


def write_list_to_file(data: List, filename: str):
    with open(filename, 'w') as f:
        for row in data:
            f.write(' '.join(row) + '\n')


def main():
    parser = ArgumentParser()
    parser.add_argument(
        '-m',
        '--model',
        type=str,
        default='gold_yolo_l_hand_post_0231_0.4841_1x3x384x480.onnx',
    )
    parser.add_argument(
        '-i',
        '--images_folder_path',
        type=str,
        default='./coco/COCO-Hand-S/images/re_anno',
    )
    parser.add_argument(
        '-a',
        '--annotations_txt_output_folder_path',
        type=str,
        default='./coco/COCO-Hand-S/re_annotations',
    )
    args = parser.parse_args()

    model = GoldYOLOONNX(
        model_path=args.model,
    )

    sorted_image_paths = get_sorted_image_paths(args.images_folder_path)

    for image_file_path in tqdm(sorted_image_paths, dynamic_ncols=True):
        image = cv2.imread(image_file_path)
        image_height = float(image.shape[0])
        image_width = float(image.shape[1])
        model_input_height = model.input_shapes[0][2]
        model_input_width = model.input_shapes[0][3]
        height_scale = model_input_height / image_height
        width_scale = model_input_width / image_width

        debug_image = copy.deepcopy(image)

        start_time = time.time()
        boxes, scores = model(debug_image)
        elapsed_time = time.time() - start_time
        fps = 1 / elapsed_time
        cv2.putText(
            debug_image,
            f'{fps:.1f} FPS (inferece + post-process)',
            (10, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.7,
            (255, 255, 255),
            2,
            cv2.LINE_AA,
        )
        cv2.putText(
            debug_image,
            f'{fps:.1f} FPS (inferece + post-process)',
            (10, 30),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.7,
            (0, 0, 255),
            1,
            cv2.LINE_AA,
        )

        annotations = []
        for box, score in zip(boxes, scores):
            x1 = int(box[0])
            y1 = int(box[1])
            x2 = int(box[2])
            y2 = int(box[3])

            cx = float((x1 + x2) / 2 / image_width)
            cy = float((y1 + y2) / 2 / image_height)
            w = float((x2 - x1) / image_width)
            h = float((y2 - y1) / image_height)

            cv2.rectangle(
                debug_image,
                (x1, y1),
                (x2, y2),
                (255,255,255),
                2,
            )
            cv2.rectangle(
                debug_image,
                (x1, y1),
                (x2, y2),
                (0,0,255),
                1,
            )
            cv2.putText(
                debug_image,
                f'{score[0]:.2f}',
                (
                    x1,
                    y1-10 if y1-10 > 0 else 10
                ),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.7,
                (255, 255, 255),
                2,
                cv2.LINE_AA,
            )
            cv2.putText(
                debug_image,
                f'{score[0]:.2f}',
                (
                    x1,
                    y1-10 if y1-10 > 0 else 10
                ),
                cv2.FONT_HERSHEY_SIMPLEX,
                0.7,
                (0, 0, 255),
                1,
                cv2.LINE_AA,
            )


            annotations.append(['0', f'{cx:.3f}', f'{cy:.3f}', f'{w:.3f}', f'{h:.3f}'])

        """
        0 0.647 0.560 0.065 0.042
        0 0.727 0.647 0.079 0.039
        0 0.306 0.508 0.079 0.078
        0 0.555 0.582 0.085 0.086
        0 0.190 0.704 0.071 0.058
        0 0.232 0.697 0.060 0.050
        """
        os.makedirs(args.annotations_txt_output_folder_path, exist_ok=True)
        anno_file_name = os.path.splitext(os.path.basename(image_file_path))[0]
        write_list_to_file(annotations, f'{args.annotations_txt_output_folder_path}/{anno_file_name}.txt')

        key = cv2.waitKey(0)
        if key == 27: # ESC
            break

        cv2.imshow("Auto annotation", debug_image)

if __name__ == "__main__":
    main()
PINTOPINTO
  • 手動再アノテーション
000000000036_000000094760, frame2_000184.txt - frame_001158.txt
PINTOPINTO
  • Head - N - step.1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.721
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.399
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.779
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.860
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.121
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.367
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.451
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.827
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.885
Results saved to runs/train/gold_yolo-n
Epoch: 179 | mAP@0.5: 0.7206811895977326 | mAP@0.50:0.95: 0.4995256459926346
  • Head - N - step.2
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.507
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.727
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.567
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.405
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.790
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.861
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.123
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.372
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.458
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.829
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.890
Results saved to runs/train/gold_yolo-n1
Epoch: 277 | mAP@0.5: 0.7271872958824573 | mAP@0.50:0.95: 0.5071289726974357
  • Head - S - step.1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.512
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.732
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.578
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.413
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.860
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.122
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.374
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.553
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.462
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.828
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.884
Results saved to runs/train/gold_yolo-s
Epoch: 141 | mAP@0.5: 0.7316947395645746 | mAP@0.50:0.95: 0.5118589534285837
  • Head - S - step.2
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.732
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.584
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.415
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.791
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.867
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.123
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.372
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.555
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.464
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.830
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.890
Results saved to runs/train/gold_yolo-s1
Epoch: 260 | mAP@0.5: 0.7321360683976307 | mAP@0.50:0.95: 0.5143983289490999
  • Head - M - step.1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.529
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.740
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.608
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.428
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.804
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.897
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.568
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.476
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.916
Results saved to runs/train/gold_yolo-m
Epoch: 195 | mAP@0.5: 0.7398006744289353 | mAP@0.50:0.95: 0.5290161368821595
  • Head - M - step.2
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.533
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.745
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.612
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.432
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.810
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.383
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.573
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.849
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.913
Results saved to runs/train/gold_yolo-m1
Epoch: 230 | mAP@0.5: 0.7454760403763417 | mAP@0.50:0.95: 0.5326515119531181
  • Head - L - step.1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.538
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.747
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.622
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.442
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.805
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.874
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.385
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.577
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.488
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.846
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.903
Results saved to runs/train/gold_yolo-l
Epoch: 178 | mAP@0.5: 0.7469358990429336 | mAP@0.50:0.95: 0.5380476982385607
  • Head - L - step.2
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.535
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.742
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.610
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.436
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.814
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.901
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.388
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.575
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.857
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.922
Results saved to runs/train/gold_yolo-l1
Epoch: 277 | mAP@0.5: 0.7423018825144603 | mAP@0.50:0.95: 0.5352949339627351
PINTOPINTO
  • 頭部アノテーションと手アノテーションのマージ
head_hand_merge.py
import os

def combine_files(head_folder, hand_cleaned_folder, output_folder):
    # フォルダが存在しない場合は作成
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # COCO-Headフォルダ内の全ての.txtファイルに対して処理
    for filename in os.listdir(head_folder):
        if filename.endswith(".txt"):
            head_file = f"{head_folder}/{filename}"
            hand_cleaned_file = f"{hand_cleaned_folder}/{filename}"
            output_file = f"{output_folder}/{filename}"

            combined_lines = []

            # COCO-Headファイルから内容を読み込む
            with open(head_file, 'r') as file:
                lines = file.readlines()
                # すべての行の末尾に改行を追加
                combined_lines.extend(line.rstrip('\n') + '\n' for line in lines)

            # COCO-Hand-Cleanedファイルから内容を読み込む
            if os.path.exists(hand_cleaned_file):
                with open(hand_cleaned_file, 'r') as file:
                    lines = file.readlines()
                    for line in lines:
                        parts = line.strip().split()
                        parts[0] = '1'  # ラベルを1に変更
                        # 行の末尾に改行を追加
                        combined_lines.append(' '.join(parts) + '\n')

            # 結合した内容を出力ファイルに書き込む
            with open(output_file, 'w') as file:
                file.writelines(combined_lines)

# 使用例
combine_files(
    head_folder='COCO-Head',
    hand_cleaned_folder='COCO-Hand-Cleaned',
    output_folder='COCO-Head-Hand'
)
PINTOPINTO
  • Head-Hand - N step.1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.694
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.464
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.354
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.687
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.831
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.398
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.517
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.445
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.764
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.864
Results saved to runs/train/gold_yolo-n
Epoch: 199 | mAP@0.5: 0.6937148862793383 | mAP@0.50:0.95: 0.43497443881089093
PINTOPINTO
  • アノテーション済み総画像枚数
body_label_count: 30,729 labels
head_label_count: 26,268 labels
hand_label_count: 18,087 labels
===============================
           Total: 66,903 labels
           Total: 14,667 images

  • チェック用画像
109 2299
419 9420
1206 24972
3380 70347
  • 現末尾
4523

000000000036_frame_001158_body
PINTOPINTO
  • N - body - 22%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.470
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.657
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.045
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.534
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.753
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.493
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.582
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.186
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.836
Results saved to runs/train/gold_yolo-n
Epoch: 178 | mAP@0.5: 0.656626965176813 | mAP@0.50:0.95: 0.47047062716690025
  • N - body - 37%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.663
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.515
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.771
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.175
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.202
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.716
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.848
Results saved to runs/train/gold_yolo-n1
Epoch: 335 | mAP@0.5: 0.6633302098111067 | mAP@0.50:0.95: 0.4829405374208939
  • N - body - 44%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.464
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.657
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.488
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.520
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.776
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.163
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.462
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.576
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.850
Results saved to runs/train/gold_yolo-n1
Epoch: 247 | mAP@0.5: 0.6571595669951118 | mAP@0.50:0.95: 0.46380854493373824
  • N - body - 51%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.469
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.656
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.497
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.086
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.531
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.776
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.160
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.469
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.581
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.233
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.697
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.854
Results saved to runs/train/gold_yolo-n1
Epoch: 324 | mAP@0.5: 0.6560160443800531 | mAP@0.50:0.95: 0.4688607070167569
  • N - body - 57%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.659
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.485
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.764
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.161
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.462
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.583
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.256
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
Results saved to runs/train/gold_yolo-n1
Epoch: 404 | mAP@0.5: 0.6594722687661295 | mAP@0.50:0.95: 0.4659762504052017
  • N - body - 64%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.626
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.444
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.139
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.766
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.389
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.516
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.676
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
Results saved to runs/train/gold_yolo-n1
Epoch: 353 | mAP@0.5: 0.6256951848232611 | mAP@0.50:0.95: 0.42580869837563295
  • N - body - 71%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.426
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.632
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.450
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.154
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.527
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.127
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.378
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.513
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.846
Results saved to runs/train/gold_yolo-n1
Epoch: 346 | mAP@0.5: 0.63163065240628 | mAP@0.50:0.95: 0.42634700808308984
  • N - body - 77%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.421
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.638
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.433
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.160
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.761
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.123
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.364
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.508
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.843
Results saved to runs/train/gold_yolo-n1
Epoch: 363 | mAP@0.5: 0.6377116420439236 | mAP@0.50:0.95: 0.42147748638998517
  • N - body - 84%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.423
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.642
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.164
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.541
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.119
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.353
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.509
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.245
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.671
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.839
Results saved to runs/train/gold_yolo-n1
Epoch: 319 | mAP@0.5: 0.6418912179375459 | mAP@0.50:0.95: 0.42306445106152857
  • N - body - 91%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.427
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.646
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.443
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.556
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.115
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.350
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.510
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.255
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.679
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
Results saved to runs/train/gold_yolo-n1
Epoch: 387 | mAP@0.5: 0.6464056146254991 | mAP@0.50:0.95: 0.4268788007021869
  • N - body - 97%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.416
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.639
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.425
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.526
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.113
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.502
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.255
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.659
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.837
Results saved to runs/train/gold_yolo-n1
Epoch: 403 | mAP@0.5: 0.6387236111886924 | mAP@0.50:0.95: 0.41601151150066806
  • N - body - 100%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.656
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.451
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.775
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.113
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.347
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.511
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.264
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.668
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
Results saved to runs/train/gold_yolo-n1
Epoch: 459 | mAP@0.5: 0.6558626353793654 | mAP@0.50:0.95: 0.43279318035818726
  • S - body - 100%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.447
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.669
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.459
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.212
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.565
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.785
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.115
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.353
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.521
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.679
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.849
Results saved to runs/train/gold_yolo-s1
Epoch: 269 | mAP@0.5: 0.6692641980128146 | mAP@0.50:0.95: 0.4465093742123356
  • M - body - 100%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.687
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.232
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.119
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.373
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.552
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.875
Results saved to runs/train/gold_yolo-m1
Epoch: 308 | mAP@0.5: 0.6867363194731014 | mAP@0.50:0.95: 0.4825374917392089
  • L - body - 100%
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.495
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.693
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.239
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.641
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.830
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.119
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.556
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.306
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.877
Results saved to runs/train/gold_yolo-l1
Epoch: 315 | mAP@0.5: 0.6928551569441443 | mAP@0.50:0.95: 0.49458101763724416
PINTOPINTO
  • クレンジング 0%
  • クレンジング 20%
  • クレンジング 33%
  • クレンジング 44%
  • クレンジング 51%
  • クレンジング 57%
  • クレンジング 64%
  • クレンジング 71%
  • クレンジング 77%
  • クレンジング 84%
  • クレンジング 91%
  • クレンジング 97%
PINTOPINTO

COCO mAP (Mean Average Precision) は、コンピュータビジョンの分野で使われる評価指標の
一つで、特にオブジェクト検出モデルの性能を評価する際に使用されます。
COCOはCommon Objects in Contextの略で、同名のデータセットを用いてモデルの評価を行います。
mAPはモデルがどれだけ正確にオブジェクトを検出できるかを測定する指標です。

COCO mAPでは、small, medium, largeという3つのカテゴリーにオブジェクトのサイズを分けて
評価します。これらのカテゴリーは、オブジェクトの画像内での面積に基づいています。

Small(小さい): このカテゴリーには、面積が32x32ピクセルより小さいオブジェクトが含まれます。
Medium(中くらい): 32x32ピクセル以上、96x96ピクセル以下のオブジェクトがこのカテゴリーに
該当します。
Large(大きい): 96x96ピクセルを超えるオブジェクトが含まれます。
COCO mAPの算出では、これらのサイズごとにオブジェクトを検出し、それぞれのサイズカテゴリー
での精度を計算します。これにより、モデルがさまざまなサイズのオブジェクトをどの程度正確に
検出できるかを評価できます。たとえば、小さいオブジェクトは検出が難しいため、
smallサイズでの高いmAPスコアはモデルが高い精度を持っていることを示します。

mAPの算出では、検出された各オブジェクトに対して予測されたバウンディングボックスと、
実際の正解バウンディングボックスを比較し、一定の閾値(例えばIoU閾値)を超えるものを
正しく検出されたとみなします。その後、平均精度(AP)をオブジェクトのカテゴリーごとに計算し、
これらのAPの平均値がmAPとなります。

COCO mAPは、モデルが様々なサイズのオブジェクトに対してどのように機能するかを理解するのに
役立ちます。モデルが特定のサイズのオブジェクトで高い精度を達成していても、他のサイズでは
性能が低い場合があり、この指標はそうしたバランスを評価するために重要です。
PINTOPINTO

クレンジングの進捗と精度反転のタイミングからCOCOデータセットの状態として想像していること。

1. small: 0%からmAP上昇。大部分のアノテーションが不十分。
2. medium: 80%からmAP上昇。アノテーションが20%ほど不足 or IoU が超雑。
3. Large: 90%からmAP上昇。アノテーションが10%ほど不足 or IoU が超雑。

https://x.com/PINTO03091/status/1727109661172896059?s=20

PINTOPINTO

この地獄アノテーションの真の狙いは、基底クラスが同じPersonの各部位、Body / Head / Hand のなおかつ HeadとHandの特徴はほぼ Body に内包されている3クラスを同時に学習したとき、最終mAPへはどのような影響が出るかを調べること。目的上、成功も失敗もない。

PINTOPINTO
  • Body-Head-Hand-N
# 640x640
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.443
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.689
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.467
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.303
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.654
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.830
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.135
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.389
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.515
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.872
Results saved to runs/train/gold_yolo-n
Epoch: 462 | mAP@0.5: 0.6892104619015829 | mAP@0.50:0.95: 0.4427396559181031
  • Body-Head-Hand-S
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.460
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.704
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.491
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.665
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.137
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.399
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.397
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.739
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.874
Results saved to runs/train/gold_yolo-s
Epoch: 456 | mAP@0.5: 0.7040425163160517 | mAP@0.50:0.95: 0.46049785564440426
  • Body-Head-Hand-M
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.738
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.540
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.864
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.427
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.562
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.892
Results saved to runs/train/gold_yolo-m
Epoch: 488 | mAP@0.5: 0.7378339081274632 | mAP@0.50:0.95: 0.5004409472223532
  • Body-Head-Hand-L
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.739
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.556
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.367
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.729
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.869
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.146
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.432
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.567
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.792
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.903
Results saved to runs/train/gold_yolo-l
Epoch: 339 | mAP@0.5: 0.7393661924683652 | mAP@0.50:0.95: 0.5093183767567647
PINTOPINTO

この推論結果を見ると、やはりBodyとHeadの特徴の相関がある程度学ばれているように見える。マネキンのそこには頭部は無い。というか、頭部単体だけの特徴を見ているときはそこは反応しない。はやり事前予想どおり各部位の特徴が連動している気がする。

  • N /S size

  • M size

PINTOPINTO

ざっと見た感じ、
1.Bodyの分類性能は低下する
2.体の各部位の相関をある程度学ぶ
3.手と裸足の誤認識はする(指の特徴が類似し過ぎ)
4.モデル全体としての分類性能は若干低下する
5.分類性能が低下しているから過検出が若干減る
6.Bodyの外側の手・頭部の過検出は激減する
な感じ。適当な所感。

PINTOPINTO

○:トレーニング中
●:トレーニング終了

N S M L Note
Hand A100 40GB 32batch
Head A100 40GB 32batch
Body A100 40GB 32batch
Hand/Head/Body A100 80GB 32batch
PINTOPINTO

ふざけて試したこの検証でなんとなく思ったことは、データセットの質を上げて small サイズの領域の検出率を大幅に向上したことでモデルの入力解像度をとても小さくしても small の検出力が負けなければそれなりの性能が出せる、ということ。データセット改善が推論速度に大きく寄与するのは面白い。

  • 入力解像度 160x128 CPU 推論でのテスト
  • 2.9ms / 推論

裏を返せば、解像度を上げても性能は大して上がらない、という通説は、データセットが腐りすぎてて Large サイズの領域の検出能力の向上がトレーニングの序盤で頭打ちになって本来のモデルの性能を引き出せていないだけじゃないか、とも思う。

PINTOPINTO
  • 基底クラス Person の3クラス Body, Head, Hand をマージして学習したときに何か面白い現象が起こっているかどうか簡単に見てみる

  • Gold-YOLO N

    • 1 class body
      mAP@0.5	mAP@0.50:0.95
        0.655         0.432
      
    • 1 class head
      mAP@0.5	mAP@0.50:0.95
        0.727         0.507
      
    • 1 class hand
      mAP@0.5	mAP@0.50:0.95
        0.692         0.404
      
    • 3 classes Body + Head + Hand
      Class Labeled_images Labels P@.5iou R@.5iou F1@.5iou mAP@.5 mAP@.5:.95
      all              486   8858   0.856    0.62    0.719  0.689      0.443
      body             486   3747   0.857    0.60    0.706  0.662      0.440
      head             475   3269   0.912    0.68    0.779  0.726      0.497
      hand             483   1842   0.842    0.59    0.694  0.680      0.391
      
このスクラップは2023/12/16にクローズされました