COCO-Hand データセットのアノテーションデータをYOLOフォーマットへ変換 (Gold-YOLO)
- COCO-Hand
# フォーマット
[image_name, xmin, xmax, ymin, ymax, x1, y1, x2, y2, x3, y3, x4, y4]
# 実際の値
000000001098.jpg,134,190,160,188,138,160,134,178,186,188,190,169,hand
000000001098.jpg,234,288,187,242,265,187,234,200,257,242,288,230,hand
000000000036.jpg,215,257,208,257,230,257,257,254,242,208,215,210,hand
- 各行から情報を抽出します。
- 画像の幅と高さを取得します。これは、YOLO format でバウンディングボックスを正規化するために必要です。
- YOLO format に合わせて、情報を変換します。具体的には、中心点の座標 (cx, cy) とバウンディングボックスの幅および高さ (bw, bh) を計算し、これらの値を画像の幅および高さで正規化します。
- 出力ファイルに書き込みます。
import os
from PIL import Image
def convert_to_yolo_format(annotation_file, output_folder, image_folder):
if not os.path.exists(output_folder):
os.makedirs(output_folder)
with open(annotation_file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip().split(',')
image_name = line[0]
xmin, xmax, ymin, ymax = map(int, line[1:5])
# クラス名は 'hand' と仮定
class_id = 0
# 画像のサイズを取得
with Image.open(os.path.join(image_folder, image_name)) as img:
width, height = img.size
# YOLO format: x_center y_center width height
x_center = (xmin + xmax) / 2.0
y_center = (ymin + ymax) / 2.0
b_width = xmax - xmin
b_height = ymax - ymin
# 正規化
x_center /= width
y_center /= height
b_width /= width
b_height /= height
# 出力ファイルへの書き込み
with open(os.path.join(output_folder, image_name.replace('.jpg', '.txt')), 'a') as out_file:
out_file.write(f"{class_id} {x_center} {y_center} {b_width} {b_height}\n")
# 使用例
annotation_file = 'COCO-Hand-S_annotations.txt'
output_folder = 'yolo_annotations'
image_folder = 'path_to_images' # 画像のパスを指定
convert_to_yolo_format(annotation_file, output_folder, image_folder)
- GOLD-YOLO
データローダ部
@staticmethod
def get_data_loader(args, cfg, data_dict):
train_path, val_path = data_dict['train'], data_dict['val']
# check data
nc = int(data_dict['nc'])
class_names = data_dict['names']
assert len(class_names) == nc, f'the length of class names does not match the number of classes defined'
grid_size = max(int(max(cfg.model.head.strides)), 32)
# create train dataloader
train_loader = create_dataloader(train_path, args.img_size, args.batch_size // args.world_size, grid_size,
hyp=dict(cfg.data_aug), augment=True, rect=False, rank=args.local_rank,
workers=args.workers, shuffle=True, check_images=args.check_images,
check_labels=args.check_labels, data_dict=data_dict, task='train')[0]
# create val dataloader
val_loader = None
if args.rank in [-1, 0]:
val_loader = create_dataloader(val_path, args.img_size, args.batch_size // args.world_size * 2, grid_size,
hyp=dict(cfg.data_aug), rect=True, rank=-1, pad=0.5,
workers=args.workers, check_images=args.check_images,
check_labels=args.check_labels, data_dict=data_dict, task='val')[0]
return train_loader, val_loader
- トレーニング用YAML - YOLOv6仕様
- annotation JSONはトレーニング開始時に勝手に生成される
- JSONの生成先パスは
annotations
フォルダの直下
# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR
train: ./coco/COCO-Hand-S/images/train # train images
val: ./coco/COCO-Hand-S/images/val # val images
test: ./coco/COCO-Hand-S/images/test # test images (optional)
# whether it is coco dataset, only coco dataset should be set to True.
is_coco: False
# Classes
nc: 1 # number of classes
names: ['hand'] # class names
pip install \
mmcv==1.6.1 \
addict \
pycocotools \
tensorboard \
opencv-python-headless \
gdown
pip3 install \
torch \
torchvision \
torchaudio \
--index-url https://download.pytorch.org/whl/cu118
- N size
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.417
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.709
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.633
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.174
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.468
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.540
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.492
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.733
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.858
Results saved to runs/train/gold_yolo-n1
Epoch: 361 | mAP@0.5: 0.7092690168491806 | mAP@0.50:0.95: 0.41735040829118536
- S size
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.429
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.718
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.432
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.653
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.823
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.176
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.475
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.544
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.497
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.733
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.851
Results saved to runs/train/gold_yolo-s1
Epoch: 201 | mAP@0.5: 0.7175394644960713 | mAP@0.50:0.95: 0.4286192100279552
- M size
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.477
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.767
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.499
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.415
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.699
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.867
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.187
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.515
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.583
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.536
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.769
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.888
Results saved to runs/train/gold_yolo-m1
Epoch: 347 | mAP@0.5: 0.7670340058835509 | mAP@0.50:0.95: 0.47673873875288386
- L size
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.490
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.785
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.523
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.711
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.902
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.192
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.527
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.592
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.544
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.781
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.923
Results saved to runs/train/gold_yolo-l1
Epoch: 271 | mAP@0.5: 0.7852822540107776 | mAP@0.50:0.95: 0.4895539540328813
- Export ONNX Model
python tools/infer.py \
--weights gold_yolo_n_hand_0.2295.pt
CVAT で生成したアノテーションと画像をYOLO 1.0 フォーマットでダウンロードしたあとの処理
- アノテーションされていない画像を削除
import os
def delete_empty_txt_and_png_pairs(directory_path):
# 指定されたディレクトリ内のすべてのファイルを取得
all_files = os.listdir(directory_path)
# .txt ファイルのみをフィルタリング
txt_files = sorted([f for f in all_files if f.endswith('.txt')])
for txt_file in txt_files:
# .txt ファイルのフルパスを取得
txt_full_path = os.path.join(directory_path, txt_file)
# .txt ファイルの中身を確認
with open(txt_full_path, 'r') as f:
content = f.read().strip() # 空白や改行を除去
# .txt ファイルが空の場合、対応する .PNG と .txt ファイルを削除
if not content:
png_file = txt_file.replace('.txt', '.PNG')
png_full_path = os.path.join(directory_path, png_file)
if os.path.exists(png_full_path):
os.remove(png_full_path)
print(f"Deleted: {png_full_path}")
jpg_file = txt_file.replace('.txt', '.jpg')
jpg_full_path = os.path.join(directory_path, jpg_file)
if os.path.exists(jpg_full_path):
os.remove(jpg_full_path)
print(f"Deleted: {jpg_full_path}")
os.remove(txt_full_path)
print(f"Deleted: {txt_full_path}")
# 実行例
dir_path = './hand_yolo/obj_train_data' # ここに適切なディレクトリパスを指定してください
delete_empty_txt_and_png_pairs(dir_path)
- 残った画像だけでアノテーション結果を画面表示して確認
import os
import cv2
def draw_bounding_boxes_from_yolo(directory_path):
all_files = os.listdir(directory_path)
png_files = sorted([f for f in all_files if f.endswith('.PNG') or f.endswith('.jpg')])
for png_file in png_files:
img_path = os.path.join(directory_path, png_file)
txt_path = os.path.join(directory_path, png_file.replace('.PNG', '.txt').replace('.jpg', '.txt'))
# 画像を読み込み
image = cv2.imread(img_path)
if not os.path.exists(txt_path):
print(f"No .txt file found for {png_file}")
continue
# YOLOフォーマットの.txtファイルからバウンディングボックスを取得
with open(txt_path, 'r') as f:
lines = f.readlines()
for line in lines:
data = line.strip().split()
if len(data) == 5:
# YOLOフォーマットの値を取得
_, x_center, y_center, width, height = map(float, data)
# YOLOフォーマットからOpenCVの座標形式に変換
x_center, y_center, width, height = x_center * image.shape[1], y_center * image.shape[0], width * image.shape[1], height * image.shape[0]
x1, y1, x2, y2 = int(x_center - width / 2), int(y_center - height / 2), int(x_center + width / 2), int(y_center + height / 2)
# バウンディングボックスを描画
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
# 画像を表示
cv2.imshow(f"Image with Bounding Boxes - {png_file}", image)
key = cv2.waitKey(0)
if key == 27: # ESC
break
cv2.destroyAllWindows()
# 実行
dir_path = './hand_yolo/obj_train_data' # ここに適切なディレクトリパスを指定してください
draw_bounding_boxes_from_yolo(dir_path)
- YOLOでトレーニングするために
train
,val
,test
にセパレート
import argparse
import os
import shutil
from glob import glob
from sklearn.model_selection import train_test_split
import argparse
def move_files_to_folder(list_of_files, destination_folder):
for f in list_of_files:
try:
shutil.copy(f, destination_folder)
except:
print(f)
assert False
def get_annotations(path):
annotations = []
images = []
for txt_file in glob(path + '/*.txt'):
annotations.append(txt_file)
image = txt_file.replace('txt', 'PNG')
if os.path.exists(image):
images.append(image)
else:
image = txt_file.replace('txt', 'jpg')
images.append(image)
return annotations, images
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--path', type=str, help='Path to images', default='hand_yolo/obj_train_data')
opt = parser.parse_args()
annotations, images = get_annotations(opt.path)
# Split the dataset into train-valid-test splits
train_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size = 0.2, random_state = 1)
val_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size = 0.5, random_state = 1)
os.makedirs('data/images/train', exist_ok=True)
os.makedirs('data/images/val', exist_ok=True)
os.makedirs('data/images/test', exist_ok=True)
os.makedirs('data/annotations/train', exist_ok=True)
os.makedirs('data/annotations/val', exist_ok=True)
os.makedirs('data/annotations/test', exist_ok=True)
move_files_to_folder(train_images, 'data/images/train')
move_files_to_folder(val_images, 'data/images/val/')
move_files_to_folder(test_images, 'data/images/test/')
move_files_to_folder(train_annotations, 'data/annotations/train/')
move_files_to_folder(val_annotations, 'data/annotations/val/')
move_files_to_folder(test_annotations, 'data/annotations/test/')
os.system('cp -r data/annotations data/labels')
if __name__ == "__main__":
main()
最終的にこうなる。この階層を起点として3フォルダを data.zip
という1ファイルに圧縮する。
CVATを使用したCOCO-Handの再アノテーション作業のコスト軽減のため、上記で生成したMサイズのモデルを使用してCOCO-Handを再アノテーションするロジック。
#!/usr/bin/env python
import os
import copy
import cv2
import time
import numpy as np
import onnxruntime
from argparse import ArgumentParser
from typing import Tuple, Optional, List
from tqdm import tqdm
class GoldYOLOONNX(object):
def __init__(
self,
model_path: Optional[str] = 'gold_yolo_l_hand_post_0231_0.4841_1x3x384x480.onnx',
class_score_th: Optional[float] = 0.35,
providers: Optional[List] = [
'CUDAExecutionProvider',
'CPUExecutionProvider',
],
):
"""GoldYOLOONNX
Parameters
----------
model_path: Optional[str]
ONNX file path for YOLOv7
class_score_th: Optional[float]
Score threshold. Default: 0.25
providers: Optional[List]
Name of onnx execution providers
Default:
[
'CUDAExecutionProvider',
'CPUExecutionProvider',
]
"""
# Threshold
self.class_score_th = class_score_th
# Model loading
session_option = onnxruntime.SessionOptions()
session_option.log_severity_level = 3
self.onnx_session = onnxruntime.InferenceSession(
model_path,
sess_options=session_option,
providers=providers,
)
self.providers = self.onnx_session.get_providers()
self.input_shapes = [
input.shape for input in self.onnx_session.get_inputs()
]
self.input_names = [
input.name for input in self.onnx_session.get_inputs()
]
self.output_names = [
output.name for output in self.onnx_session.get_outputs()
]
def __call__(
self,
image: np.ndarray,
) -> Tuple[np.ndarray, np.ndarray]:
"""GoldYOLOONNX
Parameters
----------
image: np.ndarray
Entire image
Returns
-------
boxes: np.ndarray
Predicted boxes: [N, y1, x1, y2, x2]
scores: np.ndarray
Predicted box scores: [N, score]
"""
temp_image = copy.deepcopy(image)
# PreProcess
resized_image = self.__preprocess(
temp_image,
)
# Inference
inferece_image = np.asarray([resized_image], dtype=np.float32)
boxes = self.onnx_session.run(
self.output_names,
{input_name: inferece_image for input_name in self.input_names},
)[0]
# PostProcess
result_boxes, result_scores = \
self.__postprocess(
image=temp_image,
boxes=boxes,
)
return result_boxes, result_scores
def __preprocess(
self,
image: np.ndarray,
swap: Optional[Tuple[int,int,int]] = (2, 0, 1),
) -> np.ndarray:
"""__preprocess
Parameters
----------
image: np.ndarray
Entire image
swap: tuple
HWC to CHW: (2,0,1)
CHW to HWC: (1,2,0)
HWC to HWC: (0,1,2)
CHW to CHW: (0,1,2)
Returns
-------
resized_image: np.ndarray
Resized and normalized image.
"""
# Normalization + BGR->RGB
resized_image = cv2.resize(
image,
(
int(self.input_shapes[0][3]),
int(self.input_shapes[0][2]),
)
)
resized_image = np.divide(resized_image, 255.0)
resized_image = resized_image[..., ::-1]
resized_image = resized_image.transpose(swap)
resized_image = np.ascontiguousarray(
resized_image,
dtype=np.float32,
)
return resized_image
def __postprocess(
self,
image: np.ndarray,
boxes: np.ndarray,
) -> Tuple[np.ndarray, np.ndarray]:
"""__postprocess
Parameters
----------
image: np.ndarray
Entire image.
boxes: np.ndarray
float32[N, 7]
Returns
-------
result_boxes: np.ndarray
Predicted boxes: [N, y1, x1, y2, x2]
result_scores: np.ndarray
Predicted box confs: [N, score]
"""
image_height = image.shape[0]
image_width = image.shape[1]
"""
Detector is
N -> Number of boxes detected
batchno -> always 0: BatchNo.0
batchno_classid_y1x1y2x2_score: float32[N,7]
"""
result_boxes = []
result_scores = []
if len(boxes) > 0:
scores = boxes[:, 6:7]
keep_idxs = scores[:, 0] > self.class_score_th
scores_keep = scores[keep_idxs, :]
boxes_keep = boxes[keep_idxs, :]
if len(boxes_keep) > 0:
for box, score in zip(boxes_keep, scores_keep):
x_min = int(max(box[2], 0) * image_width / self.input_shapes[0][3])
y_min = int(max(box[3], 0) * image_height / self.input_shapes[0][2])
x_max = int(min(box[4], self.input_shapes[0][3]) * image_width / self.input_shapes[0][3])
y_max = int(min(box[5], self.input_shapes[0][2]) * image_height / self.input_shapes[0][2])
result_boxes.append(
[x_min, y_min, x_max, y_max]
)
result_scores.append(
score
)
return np.asarray(result_boxes), np.asarray(result_scores)
def get_sorted_image_paths(directory):
image_exts = {".jpg", ".jpeg", ".png", ".JPG", ".PNG", ".JPEG"}
files = [os.path.join(directory, f) for f in os.listdir(directory)]
image_files = [f for f in files if os.path.splitext(f)[1].lower() in image_exts]
return sorted(image_files, key=lambda x: os.path.basename(x))
def write_list_to_file(data: List, filename: str):
with open(filename, 'w') as f:
for row in data:
f.write(' '.join(row) + '\n')
def main():
parser = ArgumentParser()
parser.add_argument(
'-m',
'--model',
type=str,
default='gold_yolo_l_hand_post_0231_0.4841_1x3x384x480.onnx',
)
parser.add_argument(
'-i',
'--images_folder_path',
type=str,
default='./coco/COCO-Hand-S/images/re_anno',
)
parser.add_argument(
'-a',
'--annotations_txt_output_folder_path',
type=str,
default='./coco/COCO-Hand-S/re_annotations',
)
args = parser.parse_args()
model = GoldYOLOONNX(
model_path=args.model,
)
sorted_image_paths = get_sorted_image_paths(args.images_folder_path)
for image_file_path in tqdm(sorted_image_paths, dynamic_ncols=True):
image = cv2.imread(image_file_path)
image_height = float(image.shape[0])
image_width = float(image.shape[1])
model_input_height = model.input_shapes[0][2]
model_input_width = model.input_shapes[0][3]
height_scale = model_input_height / image_height
width_scale = model_input_width / image_width
debug_image = copy.deepcopy(image)
start_time = time.time()
boxes, scores = model(debug_image)
elapsed_time = time.time() - start_time
fps = 1 / elapsed_time
cv2.putText(
debug_image,
f'{fps:.1f} FPS (inferece + post-process)',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
2,
cv2.LINE_AA,
)
cv2.putText(
debug_image,
f'{fps:.1f} FPS (inferece + post-process)',
(10, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 0, 255),
1,
cv2.LINE_AA,
)
annotations = []
for box, score in zip(boxes, scores):
x1 = int(box[0])
y1 = int(box[1])
x2 = int(box[2])
y2 = int(box[3])
cx = float((x1 + x2) / 2 / image_width)
cy = float((y1 + y2) / 2 / image_height)
w = float((x2 - x1) / image_width)
h = float((y2 - y1) / image_height)
cv2.rectangle(
debug_image,
(x1, y1),
(x2, y2),
(255,255,255),
2,
)
cv2.rectangle(
debug_image,
(x1, y1),
(x2, y2),
(0,0,255),
1,
)
cv2.putText(
debug_image,
f'{score[0]:.2f}',
(
x1,
y1-10 if y1-10 > 0 else 10
),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(255, 255, 255),
2,
cv2.LINE_AA,
)
cv2.putText(
debug_image,
f'{score[0]:.2f}',
(
x1,
y1-10 if y1-10 > 0 else 10
),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 0, 255),
1,
cv2.LINE_AA,
)
annotations.append(['0', f'{cx:.3f}', f'{cy:.3f}', f'{w:.3f}', f'{h:.3f}'])
"""
0 0.647 0.560 0.065 0.042
0 0.727 0.647 0.079 0.039
0 0.306 0.508 0.079 0.078
0 0.555 0.582 0.085 0.086
0 0.190 0.704 0.071 0.058
0 0.232 0.697 0.060 0.050
"""
os.makedirs(args.annotations_txt_output_folder_path, exist_ok=True)
anno_file_name = os.path.splitext(os.path.basename(image_file_path))[0]
write_list_to_file(annotations, f'{args.annotations_txt_output_folder_path}/{anno_file_name}.txt')
key = cv2.waitKey(0)
if key == 27: # ESC
break
cv2.imshow("Auto annotation", debug_image)
if __name__ == "__main__":
main()
- 手動再アノテーション
000000000036_000000094760, frame2_000184.txt - frame_001158.txt
- Head - N - step.1
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.500
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.721
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.399
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.779
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.860
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.121
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.367
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.544
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.451
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.827
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.885
Results saved to runs/train/gold_yolo-n
Epoch: 179 | mAP@0.5: 0.7206811895977326 | mAP@0.50:0.95: 0.4995256459926346
- Head - N - step.2
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.507
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.727
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.567
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.405
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.790
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.861
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.123
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.372
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.551
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.458
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.829
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.890
Results saved to runs/train/gold_yolo-n1
Epoch: 277 | mAP@0.5: 0.7271872958824573 | mAP@0.50:0.95: 0.5071289726974357
- Head - S - step.1
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.732
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.578
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.413
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.860
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.122
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.374
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.553
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.462
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.828
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.884
Results saved to runs/train/gold_yolo-s
Epoch: 141 | mAP@0.5: 0.7316947395645746 | mAP@0.50:0.95: 0.5118589534285837
- Head - S - step.2
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.514
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.732
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.584
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.415
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.791
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.867
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.123
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.372
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.555
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.464
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.830
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.890
Results saved to runs/train/gold_yolo-s1
Epoch: 260 | mAP@0.5: 0.7321360683976307 | mAP@0.50:0.95: 0.5143983289490999
- Head - M - step.1
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.529
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.740
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.608
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.428
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.804
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.897
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.380
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.568
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.476
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.844
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.916
Results saved to runs/train/gold_yolo-m
Epoch: 195 | mAP@0.5: 0.7398006744289353 | mAP@0.50:0.95: 0.5290161368821595
- Head - M - step.2
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.533
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.745
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.612
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.432
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.810
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.383
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.573
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.849
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.913
Results saved to runs/train/gold_yolo-m1
Epoch: 230 | mAP@0.5: 0.7454760403763417 | mAP@0.50:0.95: 0.5326515119531181
- Head - L - step.1
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.538
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.747
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.622
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.442
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.805
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.874
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.385
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.577
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.488
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.846
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.903
Results saved to runs/train/gold_yolo-l
Epoch: 178 | mAP@0.5: 0.7469358990429336 | mAP@0.50:0.95: 0.5380476982385607
- Head - L - step.2
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.535
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.742
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.610
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.436
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.814
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.901
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.388
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.575
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.481
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.857
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.922
Results saved to runs/train/gold_yolo-l1
Epoch: 277 | mAP@0.5: 0.7423018825144603 | mAP@0.50:0.95: 0.5352949339627351
- 頭部アノテーションと手アノテーションのマージ
import os
def combine_files(head_folder, hand_cleaned_folder, output_folder):
# フォルダが存在しない場合は作成
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# COCO-Headフォルダ内の全ての.txtファイルに対して処理
for filename in os.listdir(head_folder):
if filename.endswith(".txt"):
head_file = f"{head_folder}/{filename}"
hand_cleaned_file = f"{hand_cleaned_folder}/{filename}"
output_file = f"{output_folder}/{filename}"
combined_lines = []
# COCO-Headファイルから内容を読み込む
with open(head_file, 'r') as file:
lines = file.readlines()
# すべての行の末尾に改行を追加
combined_lines.extend(line.rstrip('\n') + '\n' for line in lines)
# COCO-Hand-Cleanedファイルから内容を読み込む
if os.path.exists(hand_cleaned_file):
with open(hand_cleaned_file, 'r') as file:
lines = file.readlines()
for line in lines:
parts = line.strip().split()
parts[0] = '1' # ラベルを1に変更
# 行の末尾に改行を追加
combined_lines.append(' '.join(parts) + '\n')
# 結合した内容を出力ファイルに書き込む
with open(output_file, 'w') as file:
file.writelines(combined_lines)
# 使用例
combine_files(
head_folder='COCO-Head',
hand_cleaned_folder='COCO-Hand-Cleaned',
output_folder='COCO-Head-Hand'
)
- Head-Hand - N step.1
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.435
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.694
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.687
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.831
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.143
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.398
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.517
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.445
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.764
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.864
Results saved to runs/train/gold_yolo-n
Epoch: 199 | mAP@0.5: 0.6937148862793383 | mAP@0.50:0.95: 0.43497443881089093
- 致命的なバグ (Nano, S 限定)
- 致命的なバグ Lossが異常
- アノテーション済み総画像枚数
body_label_count: 30,729 labels
head_label_count: 26,268 labels
hand_label_count: 18,087 labels
===============================
Total: 66,903 labels
Total: 14,667 images
- チェック用画像
109 2299
419 9420
1206 24972
3380 70347
- 現末尾
4523
000000000036_frame_001158_body
- N - body - 22%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.470
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.657
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.506
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.045
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.534
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.753
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.174
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.493
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.582
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.186
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.836
Results saved to runs/train/gold_yolo-n
Epoch: 178 | mAP@0.5: 0.656626965176813 | mAP@0.50:0.95: 0.47047062716690025
- N - body - 37%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.483
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.663
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.515
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.063
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.771
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.175
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.497
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.202
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.716
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.848
Results saved to runs/train/gold_yolo-n1
Epoch: 335 | mAP@0.5: 0.6633302098111067 | mAP@0.50:0.95: 0.4829405374208939
- N - body - 44%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.657
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.488
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.520
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.776
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.163
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.462
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.576
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.850
Results saved to runs/train/gold_yolo-n1
Epoch: 247 | mAP@0.5: 0.6571595669951118 | mAP@0.50:0.95: 0.46380854493373824
- N - body - 51%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.469
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.656
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.497
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.086
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.531
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.776
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.160
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.469
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.581
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.233
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.697
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.854
Results saved to runs/train/gold_yolo-n1
Epoch: 324 | mAP@0.5: 0.6560160443800531 | mAP@0.50:0.95: 0.4688607070167569
- N - body - 57%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.659
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.485
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.088
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.764
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.161
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.462
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.583
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.256
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.688
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
Results saved to runs/train/gold_yolo-n1
Epoch: 404 | mAP@0.5: 0.6594722687661295 | mAP@0.50:0.95: 0.4659762504052017
- N - body - 64%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.626
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.444
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.139
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.528
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.766
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.131
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.389
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.219
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.676
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
Results saved to runs/train/gold_yolo-n1
Epoch: 353 | mAP@0.5: 0.6256951848232611 | mAP@0.50:0.95: 0.42580869837563295
- N - body - 71%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.632
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.450
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.154
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.527
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.772
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.127
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.513
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.669
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.846
Results saved to runs/train/gold_yolo-n1
Epoch: 346 | mAP@0.5: 0.63163065240628 | mAP@0.50:0.95: 0.42634700808308984
- N - body - 77%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.421
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.638
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.160
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.761
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.123
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.364
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.508
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.669
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.843
Results saved to runs/train/gold_yolo-n1
Epoch: 363 | mAP@0.5: 0.6377116420439236 | mAP@0.50:0.95: 0.42147748638998517
- N - body - 84%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.642
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.435
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.164
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.541
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.119
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.353
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.509
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.245
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.671
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.839
Results saved to runs/train/gold_yolo-n1
Epoch: 319 | mAP@0.5: 0.6418912179375459 | mAP@0.50:0.95: 0.42306445106152857
- N - body - 91%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.427
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.646
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.443
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.556
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.115
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.350
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.510
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.255
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.679
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.845
Results saved to runs/train/gold_yolo-n1
Epoch: 387 | mAP@0.5: 0.6464056146254991 | mAP@0.50:0.95: 0.4268788007021869
- N - body - 97%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.416
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.639
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.425
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.178
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.526
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.113
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.255
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.659
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.837
Results saved to runs/train/gold_yolo-n1
Epoch: 403 | mAP@0.5: 0.6387236111886924 | mAP@0.50:0.95: 0.41601151150066806
- N - body - 100%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.656
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.451
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.775
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.113
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.347
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.511
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.668
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.847
Results saved to runs/train/gold_yolo-n1
Epoch: 459 | mAP@0.5: 0.6558626353793654 | mAP@0.50:0.95: 0.43279318035818726
- S - body - 100%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.447
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.669
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.459
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.212
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.565
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.785
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.115
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.353
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.521
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.679
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.849
Results saved to runs/train/gold_yolo-s1
Epoch: 269 | mAP@0.5: 0.6692641980128146 | mAP@0.50:0.95: 0.4465093742123356
- M - body - 100%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.483
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.687
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.506
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.232
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.119
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.373
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.552
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.300
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.728
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.875
Results saved to runs/train/gold_yolo-m1
Epoch: 308 | mAP@0.5: 0.6867363194731014 | mAP@0.50:0.95: 0.4825374917392089
- L - body - 100%
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.495
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.693
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.516
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.239
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.641
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.830
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.119
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.380
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.556
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.306
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.877
Results saved to runs/train/gold_yolo-l1
Epoch: 315 | mAP@0.5: 0.6928551569441443 | mAP@0.50:0.95: 0.49458101763724416
- クレンジング 0%
- クレンジング 20%
- クレンジング 33%
- クレンジング 44%
- クレンジング 51%
- クレンジング 57%
- クレンジング 64%
- クレンジング 71%
- クレンジング 77%
- クレンジング 84%
- クレンジング 91%
- クレンジング 97%
COCO mAP (Mean Average Precision) は、コンピュータビジョンの分野で使われる評価指標の
一つで、特にオブジェクト検出モデルの性能を評価する際に使用されます。
COCOはCommon Objects in Contextの略で、同名のデータセットを用いてモデルの評価を行います。
mAPはモデルがどれだけ正確にオブジェクトを検出できるかを測定する指標です。
COCO mAPでは、small, medium, largeという3つのカテゴリーにオブジェクトのサイズを分けて
評価します。これらのカテゴリーは、オブジェクトの画像内での面積に基づいています。
Small(小さい): このカテゴリーには、面積が32x32ピクセルより小さいオブジェクトが含まれます。
Medium(中くらい): 32x32ピクセル以上、96x96ピクセル以下のオブジェクトがこのカテゴリーに
該当します。
Large(大きい): 96x96ピクセルを超えるオブジェクトが含まれます。
COCO mAPの算出では、これらのサイズごとにオブジェクトを検出し、それぞれのサイズカテゴリー
での精度を計算します。これにより、モデルがさまざまなサイズのオブジェクトをどの程度正確に
検出できるかを評価できます。たとえば、小さいオブジェクトは検出が難しいため、
smallサイズでの高いmAPスコアはモデルが高い精度を持っていることを示します。
mAPの算出では、検出された各オブジェクトに対して予測されたバウンディングボックスと、
実際の正解バウンディングボックスを比較し、一定の閾値(例えばIoU閾値)を超えるものを
正しく検出されたとみなします。その後、平均精度(AP)をオブジェクトのカテゴリーごとに計算し、
これらのAPの平均値がmAPとなります。
COCO mAPは、モデルが様々なサイズのオブジェクトに対してどのように機能するかを理解するのに
役立ちます。モデルが特定のサイズのオブジェクトで高い精度を達成していても、他のサイズでは
性能が低い場合があり、この指標はそうしたバランスを評価するために重要です。
物体検出のモデルが、国籍とか文化みたいなコンテキストを学習するスキームならアノテーションするが、外見だけでシンプルに人体と判断できる要素が無いから。
クレンジングの進捗と精度反転のタイミングからCOCOデータセットの状態として想像していること。
1. small: 0%からmAP上昇。大部分のアノテーションが不十分。
2. medium: 80%からmAP上昇。アノテーションが20%ほど不足 or IoU が超雑。
3. Large: 90%からmAP上昇。アノテーションが10%ほど不足 or IoU が超雑。
この地獄アノテーションの真の狙いは、基底クラスが同じPersonの各部位、Body / Head / Hand のなおかつ HeadとHandの特徴はほぼ Body に内包されている3クラスを同時に学習したとき、最終mAPへはどのような影響が出るかを調べること。目的上、成功も失敗もない。
- Body-Head-Hand-N
# 640x640
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.443
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.689
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.467
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.303
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.654
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.830
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.135
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.389
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.515
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.739
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.872
Results saved to runs/train/gold_yolo-n
Epoch: 462 | mAP@0.5: 0.6892104619015829 | mAP@0.50:0.95: 0.4427396559181031
- Body-Head-Hand-S
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.460
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.704
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.491
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.665
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.838
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.137
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.399
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.526
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.397
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.739
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.874
Results saved to runs/train/gold_yolo-s
Epoch: 456 | mAP@0.5: 0.7040425163160517 | mAP@0.50:0.95: 0.46049785564440426
- Body-Head-Hand-M
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.500
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.738
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.540
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.864
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.143
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.427
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.562
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.430
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.788
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.892
Results saved to runs/train/gold_yolo-m
Epoch: 488 | mAP@0.5: 0.7378339081274632 | mAP@0.50:0.95: 0.5004409472223532
- Body-Head-Hand-L
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.509
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.739
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.556
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.367
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.729
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.869
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.146
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.432
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.567
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.792
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.903
Results saved to runs/train/gold_yolo-l
Epoch: 339 | mAP@0.5: 0.7393661924683652 | mAP@0.50:0.95: 0.5093183767567647
この推論結果を見ると、やはりBodyとHeadの特徴の相関がある程度学ばれているように見える。マネキンのそこには頭部は無い。というか、頭部単体だけの特徴を見ているときはそこは反応しない。はやり事前予想どおり各部位の特徴が連動している気がする。
-
N /S size
-
M size
ざっと見た感じ、
1.Bodyの分類性能は低下する
2.体の各部位の相関をある程度学ぶ
3.手と裸足の誤認識はする(指の特徴が類似し過ぎ)
4.モデル全体としての分類性能は若干低下する
5.分類性能が低下しているから過検出が若干減る
6.Bodyの外側の手・頭部の過検出は激減する
な感じ。適当な所感。
○:トレーニング中
●:トレーニング終了
N | S | M | L | Note | |
---|---|---|---|---|---|
Hand | ● | ● | ● | ● | A100 40GB 32batch |
Head | A100 40GB 32batch | ||||
Body | ● | ● | ● | ● | A100 40GB 32batch |
Hand/Head/Body | ● | ● | ● | ● | A100 80GB 32batch |
3クラスのテスト - Nano
ふざけて試したこの検証でなんとなく思ったことは、データセットの質を上げて small サイズの領域の検出率を大幅に向上したことでモデルの入力解像度をとても小さくしても small の検出力が負けなければそれなりの性能が出せる、ということ。データセット改善が推論速度に大きく寄与するのは面白い。
- 入力解像度 160x128 CPU 推論でのテスト
- 2.9ms / 推論
裏を返せば、解像度を上げても性能は大して上がらない、という通説は、データセットが腐りすぎてて Large サイズの領域の検出能力の向上がトレーニングの序盤で頭打ちになって本来のモデルの性能を引き出せていないだけじゃないか、とも思う。
-
基底クラス
Person
の3クラスBody
,Head
,Hand
をマージして学習したときに何か面白い現象が起こっているかどうか簡単に見てみる -
Gold-YOLO N
- 1 class
body
mAP@0.5 mAP@0.50:0.95 0.655 0.432
- 1 class
head
mAP@0.5 mAP@0.50:0.95 0.727 0.507
- 1 class
hand
mAP@0.5 mAP@0.50:0.95 0.692 0.404
- 3 classes
Body + Head + Hand
Class Labeled_images Labels P@.5iou R@.5iou F1@.5iou mAP@.5 mAP@.5:.95 all 486 8858 0.856 0.62 0.719 0.689 0.443 body 486 3747 0.857 0.60 0.706 0.662 0.440 head 475 3269 0.912 0.68 0.779 0.726 0.497 hand 483 1842 0.842 0.59 0.694 0.680 0.391
- 1 class