😊

sit4onnxを使った推論テスト

2024/03/09に公開

ONNXモデルの単純な推論テストを行うためのツールであるsit4onnxを試してみる。

https://github.com/PINTO0309/sit4onnx

モデルは前記事と同じ huggingfacedistilgpt2

https://huggingface.co/docs/transformers/model_doc/gpt2

インストールも簡単。

!pip install sit4onnx

ダミーテンソルを自動生成してくれる点と、最初の1回の結果を捨てる(ウォームアップ)点が気が利いている…

simplifyする前のモデル

!sit4onnx -if distilgpt2_20230307.onnx -oep cpu  # デフォルトは内部で11回ループ
INFO: file: distilgpt2_20230307.onnx
INFO: providers: ['CPUExecutionProvider']
INFO: input_name.1: input_ids shape: [1, 6] dtype: int64
INFO: test_loop_count: 10
INFO: total elapsed time:  59.000492095947266 ms
INFO: avg elapsed time per pred:  5.900049209594727 ms
INFO: output_name.1: outputs shape: [1, 6, 768] dtype: float32
INFO: output_name.2: key.3 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.3: value.3 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.4: key.11 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.5: value.11 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.6: key.19 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.7: value.19 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.8: key.27 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.9: value.27 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.10: key.35 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.11: value.35 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.12: key.43 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.13: value.43 shape: [1, 12, 6, 64] dtype: float32

simplifyした後のモデル

!sit4onnx -if distilgpt2_20230307_simplified.onnx -oep cpu  # デフォルトは内部で11回ループ
INFO: file: distilgpt2_20230307_simplified.onnx
INFO: providers: ['CPUExecutionProvider']
INFO: input_name.1: input_ids shape: [1, 6] dtype: int64
INFO: test_loop_count: 10
INFO: total elapsed time:  49.99661445617676 ms
INFO: avg elapsed time per pred:  4.999661445617676 ms
INFO: output_name.1: outputs shape: [1, 6, 768] dtype: float32
INFO: output_name.2: key.3 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.3: value.3 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.4: key.11 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.5: value.11 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.6: key.19 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.7: value.19 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.8: key.27 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.9: value.27 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.10: key.35 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.11: value.35 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.12: key.43 shape: [1, 12, 6, 64] dtype: float32
INFO: output_name.13: value.43 shape: [1, 12, 6, 64] dtype: float32

オプションも豊富で、例えば以下のようなケースで役立ちそう。ツールの使い方を思い出す前にスクラッチベタ書きでテストしてしまいそうな気もするが、こっちを使ったほうがコマンド一行で済むので格段に楽

  • --test_loop_count=11でモデルの想定される処理時間と大きく差異がないか確認
  • --batch_sizeでの処理時間の確認
  • --output_numpy_fileで出力OPが想定される値になっているかの確認
optional arguments:
  -h, --help
    show this help message and exit.
 
  -if INPUT_ONNX_FILE_PATH, --input_onnx_file_path INPUT_ONNX_FILE_PATH
    Input onnx file path.
 
  -b BATCH_SIZE, --batch_size BATCH_SIZE
    Value to be substituted if input batch size is undefined.
    This is ignored if the input dimensions are all of static size.
    Also ignored if input_numpy_file_paths_for_testing or
    numpy_ndarrays_for_testing or fixed_shapes is specified.
 
  -fs FIXED_SHAPES [FIXED_SHAPES ...], --fixed_shapes FIXED_SHAPES [FIXED_SHAPES ...]
    Input OPs with undefined shapes are changed to the specified shape.
    This parameter can be specified multiple times depending on
    the number of input OPs in the model.
    Also ignored if input_numpy_file_paths_for_testing or
    numpy_ndarrays_for_testing is specified.
    e.g.
    --fixed_shapes 1 3 224 224 \
    --fixed_shapes 1 5 \
    --fixed_shapes 1 1 224 224
 
  -tlc TEST_LOOP_COUNT, --test_loop_count TEST_LOOP_COUNT
    Number of times to run the test.
    The total execution time is divided by the number of times the test is executed,
    and the average inference time per inference is displayed.
 
  -oep {tensorrt,cuda,openvino_cpu,openvino_gpu,cpu}, \
    --onnx_execution_provider {tensorrt,cuda,openvino_cpu,openvino_gpu,cpu}
 
    ONNX Execution Provider.
 
  -ifp INPUT_NUMPY_FILE_PATHS_FOR_TESTING, \
    --input_numpy_file_paths_for_testing INPUT_NUMPY_FILE_PATHS_FOR_TESTING
 
    Use an external file of numpy.ndarray saved using np.save as input data for testing.
    This parameter can be specified multiple times depending on
    the number of input OPs in the model.
    If this parameter is specified, the value specified for
    batch_size and fixed_shapes are ignored.
    e.g.
    --input_numpy_file_paths_for_testing aaa.npy \
    --input_numpy_file_paths_for_testing bbb.npy \
    --input_numpy_file_paths_for_testing ccc.npy
 
  -ofp, --output_numpy_file
    Outputs the last inference result to an .npy file.
 
  -n, --non_verbose
    Do not show all information logs. Only error logs are displayed.

Discussion