🙆

M5Stack Module LLM でカスタムモデルを動かす～ 1.モデル変換

2024/12/09に公開

M5Stack

LLM

yolov9

tech

はじめに

この記事では@PINTO03091氏によるYOLOv9カスタムモデルのwoholebody17のONNXモデルをM5Stack Module LLMのNPUで実行できるAXMODELに変換します。

プロジェクトの概要については0.概要をご確認ください。

環境

変換はAXELA社から提供されているDocker image内で行います。動作はWin10、Ubuntu24.04環境化で確認しています。

Palsar2導入

Palsar2はonnx形式のモデルをAXERA社のNPUで実行可能なaxmodel形式に変換するためのフレームワークです。実行環境のdocker imageが配布されているためこれを利用します。
おおむねこちらの公式ガイドの方法に従って作業します。

ここではUbuntu機を利用しますが、Win下でももちろん実行可能です。

実行環境

OS

Uubutuのバージョンは24.04.01 LTS

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble
$ uname -m
x86_64

docker

このページを参考にインストール。

$ sudo docker version
Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:40:59 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:40:59 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.22
  GitCommit:        7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc:
  Version:          1.1.14
  GitCommit:        v1.1.14-0-g2c9f560
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Palsar導入
　モデルダウンロード
　キャリブレーションデータセット
　変換

dockerイメージをダウンロードしてロード

docker imageはここ(Google driveへのリンク)にあるので、ダウンロードしてzipを解凍。
2024/12/01現在、最新はax_pulsar2_3.2_patch1_temp_vlm.tar.gz

tar.gzファイルがあるフォルダ下で

docker load -i ax_pulsar2_3.2_patch1_temp_vlm.tar.gz

ロード出来たら確認

docker images
REPOSITORY   TAG             IMAGE ID       CREATED      SIZE
pulsar2      temp-58aa62e4   c6ccb211d0bc   4 days ago   2.58GB

TAGが判りにくいので変更

docker tag c6ccb211d0bc pulsar2:3.2.1
docker rmi pulsar2:temp-58aa62e4

念のため確認

$ sudo docker images
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
pulsar2      3.2.1     c6ccb211d0bc   4 days ago   2.58GB

作業用フォルダを作る

適当な場所に作業用フォルダを作成して、exampleのデータ(ダウンロード先)を配置します。

ダウンロードしたファイルはquick_start-example.zipなので解凍して作業用フォルダ内に以下の様に配置します。

$ tree -L 2
.
├── config
│   ├── mobilenet_v2_build_config.json
│   └── yolov5s_config.json
├── dataset
│   ├── coco_4.tar
│   └── imagenet-32-images.tar
├── model
│   ├── mobilenetv2-sim.onnx
│   └── yolov5s.onnx
└── pulsar2-run-helper
    ├── cli_classification.py
    ├── cli_detection.py
    ├── list.txt
    ├── models
    ├── pulsar2_run_helper
    ├── requirements.txt
    ├── setup.cfg
    ├── sim_images
    ├── sim_inputs
    └── sim_outputs

10 directories, 11 files

Docker run

作業用フォルダ内で、sudo docker runして作業用フォルダ内に構成したファイル類が反映されているのを確認しておきます。

$ docker run -it --net host --rm -v $PWD:/data pulsar2:3.2.1
root@thinkcentre-m73:/data# tree -L 2
.
|-- config
|   |-- mobilenet_v2_build_config.json
|   `-- yolov5s_config.json
|-- dataset
|   |-- coco_4.tar
|   `-- imagenet-32-images.tar
|-- model
|   |-- mobilenetv2-sim.onnx
|   `-- yolov5s.onnx
`-- pulsar2-run-helper
    |-- cli_classification.py
    |-- cli_detection.py
    |-- list.txt
    |-- models
    |-- pulsar2_run_helper
    |-- requirements.txt
    |-- setup.cfg
    |-- sim_images
    |-- sim_inputs
    `-- sim_outputs

9 directories, 11 files

exampleを実行

cd /data
pulsar2 build --input model/mobilenetv2-sim.onnx --output_dir output --config config/mobilenet_v2_build_config.json --target_hardware AX620E

設定方法は後で見るとして、これで/data/output/compiled.axmodelができた。

wholebody17をダウンロード

PINTO_model_zoo の457_YOLOv9-Wholebody17から、ReLUモデルをダウンロードします。

このコンテナにはcurlが入っていないようなのでcurlのインストールから。

apt update
apt install curl

作業フォルダのmodelフォルダーに移動してモデルダウンロード用のスクリプトをwget。
chmodしてから実行。

cd /data/model
wget https://raw.githubusercontent.com/PINTO0309/PINTO_model_zoo/refs/heads/main/457_YOLOv9-Wholebody17/download_t_relu.sh
chmod 755 ./download_t_relu.sh
./download_t_relu.sh

多数のonnxモデルがダウンロードされますが、post付のモデルは使用しません。

onnxモデルの刈込

YOLOv9 のonnxモデルを量子化変換する際、処理の最後の層を刈り込んでおくと精度劣化を低減できます。
/data/modelフォルダに以下のモデル刈込用pythonスクリプトを保存し、実行します。

# YOLOv9 onnxモデル刈込用スクリプト

import onnx
import os
import sys

def main(input_path):
    # 入力ファイル名と拡張子から出力パスを生成
    base_name, ext = os.path.splitext(input_path)
    output_path = f"{base_name}_cut{ext}"

    # 入力と出力のノード名を指定
    input_names = ["images"]
    output_names = [
        "/model.22/Concat_output_0",
        "/model.22/Concat_1_output_0",
        "/model.22/Concat_2_output_0"
    ]

    # モデルを切り詰め
    onnx.utils.extract_model(input_path, output_path, input_names, output_names)
    print(f"モデルが {output_path} に保存されました。")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python script_name.py <input_path>")
        sys.exit(1)

    input_path = sys.argv[1]

    # 入力ファイルの存在確認
    if not os.path.exists(input_path):
        print(f"エラー: 入力ファイル '{input_path}' が見つかりません。")
        sys.exit(1)

    main(input_path)

以上のコードをcut_off_yolov9.pyとして保存し、ターゲットのモデルを刈り込みます。
今回は480x640サイズの画像を投入するモデルを対象にします。

chmod 755 cut_off_yolov9.py
python3 cut_off_yolov9.py yolov9_t_wholebody17_relu_0100_1x3x480x640.onnx

カット後のモデルはmodel/内に
yolov9_t_wholebody17_relu_0100_1x3x480x640_cut.onnx
として保存されています。

余裕のある方は、カット前後のonnxファイルをNetronで開いて眺めてみるのをお勧めします。

コンフィグの設定

作業フォルダのconfig/に変換用設定を書いたjsonファイルを保存します。
今回は次のような設定を行い、yolo9_config_cut_coco4.jsonとして保存します。

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "./dataset/coco_4.tar",
        "calibration_size": 4,
        "calibration_mean": [82.06766162413021, 101.43821192825563, 106.88892940888948],
        "calibration_std": [56.29755067811651, 61.36473475095512, 60.07529956098403]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method":"EndToEnd"
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "output_processors": [
    {
      "tensor_name": "/model.22/Concat_output_0",
      "dst_perm": [0, 2, 3, 1]
    },    {
      "tensor_name": "/model.22/Concat_1_output_0",
      "dst_perm": [0, 2, 3, 1]
    },    {
      "tensor_name": "/model.22/Concat_2_output_0",
      "dst_perm": [0, 2, 3, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

この設定でポイントとなるのはinput_configsのCariblation_dataset,Cariblation_size,Caribulation_mean,Cariblation_stdの部分です。
これは量子化変換する際のインプット画像のキャリブレーションに使用するデータセットとそのパラメータです。

原則的に、
ターゲットとするモデルの学習に使われたデータセットから100～1000枚程度を抽出し
①抽出したデータセットと
②抽出したデータセットに含まれる画像の枚数と
③抽出したデータセットすべての画像の画素のRGB値の平均と
④同じく標準偏差を
入力します。

とはいえ、ある程度サボっても変換はできるため、ここではpulsar2のデモに含まれているMS-COCO由来の4枚のデータセットを用います。
私はMS COCOの中から760枚の画像を抽出したデータセットを使用してみましたが、あまり良くなった気がしていません。もう少し調査、検証が必要です。
参考にそれぞれのデータセットで変換した際のデータをGistに共有します。

MeanとStdの測定

データセットのmeanとstdを測定は次のスクリプトで可能です。

#指定したフォルダ内の画像をすべて読み取って平均と標準偏差を求める。

import os
import sys
import cv2
import numpy as np

def calculate_calibration_statistics(dataset_path):
    mean_sum = np.zeros(3, dtype=np.float64)
    std_sum = np.zeros(3, dtype=np.float64)
    pixel_count = 0

    image_files = [f for f in os.listdir(dataset_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))]

    if not image_files:
        raise ValueError("フォルダ内に画像ファイルが見つかりません。")

    for image_file in image_files:
        image_path = os.path.join(dataset_path, image_file)
        image = cv2.imread(image_path)
        if image is None:
            print(f"画像を読み込めませんでした: {image_file}")
            continue

        pixel_count += image.shape[0] * image.shape[1]
        mean_sum += image.mean(axis=(0, 1))
        std_sum += image.std(axis=(0, 1))

    calibration_mean = mean_sum / len(image_files)
    calibration_std = std_sum / len(image_files)

    return calibration_mean.tolist(), calibration_std.tolist()

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python calc_parameta.py <dataset_path>")
        sys.exit(1)

    dataset_path = sys.argv[1]
    if not os.path.exists(dataset_path):
        print(f"フォルダが見つかりません: {dataset_path}")
        sys.exit(1)

    mean, std = calculate_calibration_statistics(dataset_path)
    print("Calibration Mean:", mean)
    print("Calibration Std:", std)

使用する際は、キャリブレーション用データセットを解凍して一つのフォルダにしたうえで、上記スクリプトをcalc_parameta.pyと保存して、データセットフォルダのパスをして実行します。

cd /data/dataset
mkdir coco_4
tar -xvf coco_4.tar -C ./coco_4/
python3 calc_parameta.py ./coco_4/

model変換

作業用フォルダに戻って、Pulsar2を実行します。

cd /data
pulsar2 build --input model/yolov9_t_wholebody17_relu_0100_1x3x480x640_cut.onnx --output_dir output --config config/yolov9_config_cut_coco4.json --target_hardware AX620E

うまく変換ができるとoutputフォルダ直下にcompiled.axmodelとして保存されます。