🐨

【Method for ML】COCO format explained

2024/09/10に公開

1. What the COCO format for?

The COCO (Common Objects in Context) format is a popular data annotation format, especially in computer vision tasks like object detection, instance segmentation, and keypoint detection.

It is easy to scale and used in some libraries like MMDetection.

Please note that it doesn't represent the dataset itself, it is a format to explain the dataset's contents.

2. Example

Here is an example code:

{
  "images": [
    {
      "id": 1,
      "file_name": "animals/cat/image1.jpeg",
      "width": 640,
      "height": 480
    },
    {
      "id": 2,
      "file_name": "vehicles/car/image4.jpeg",
      "width": 640,
      "height": 480
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [100, 150, 50, 80],
      "segmentation": [[120, 160, 180, 160, 180, 200, 120, 200]],
      "area": 4000,
      "iscrowd": 0
    },
    {
      "id": 2,
      "image_id": 1,
      "category_id": 2,
      "bbox": [200, 100, 75, 85],
      "segmentation": [[205, 105, 280, 105, 280, 190, 205, 190]],
      "area": 6375,
      "iscrowd": 0
    },
    {
      "id": 3,
      "image_id": 2,
      "category_id": 1,
      "bbox": [50, 80, 60, 90],
      "segmentation": [[55, 85, 115, 85, 115, 170, 55, 170]],
      "area": 5400,
      "iscrowd": 0
    }
  ],
  "categories": [
    {
      "id": 1,
      "name": "person",
      "supercategory": "human"
    },
    {
      "id": 2,
      "name": "dog",
      "supercategory": "animal"
    }
  ]
}

2.1 Feature

The COCO format uses a JSON-like shape to represent information in the dataset.

The feature is below:
・Easy to scale. In large datasets like COCO, images often contain many objects (e.g., a busy street scene with pedestrians, cars, and bicycles). By adding more annotations for each object, the data structure scales naturally.
・One image may have 2 or more annotations. The above example image_id=1 has two annotations at "annotations"
・Task Versatility. We can add annotations for different tasks on the same image, like bounding boxes for object detection and keypoints for pose estimation, without re-annotating the entire image.

COCO is often used for computer vision challenges and provides benchmark scores. of course, it is easy to use, but another advantage is that it is widely available to the general public.

3. Summary

This time, I explained the COCO data format.
That is a JSON-like format to include the various information for provide it efficiently. Also easy to add the info.

Famous libraries like MMDetection or others use The COCO format, so please try to use it if you interested in.

Discussion