iOS向けONNX Runtimeのビルド

iOS向けにONNX Runtimeをビルドしたい。

以下のサンプルに基づいて、CocoaPodsから一応ビルド済みのONNX Runtimeパッケージをダウンロードできる。

Xcode上のiPhoneシミュレータ向けならアプリのビルド、インストールができた。

対して、実機（iPhone XS）向けにビルド、インストールしようとしたらアプリのビルドエラーが出た。

ld: '/Users/dhirooka/Library/Developer/Xcode/DerivedData/ORTObjectDetection-gbsucirtveleltgzkrnnmhtkswad/Build/
  Products/Debug-iphoneos/XCFrameworkIntermediates/onnxruntime-mobile-c/onnxruntime.framework/onnxruntime(model.o)' 
  does not contain bitcode. You must rebuild it with bitcode enabled (Xcode setting ENABLE_BITCODE), obtain an updated 
  library from the vendor, or disable bitcode for this target. file '/Users/dhirooka/Library/Developer/Xcode/
  DerivedData/ORTObjectDetection-gbsucirtveleltgzkrnnmhtkswad/Build/Products/Debug-iphoneos/XCFrameworkIntermediates/
  onnxruntime-mobile-c/onnxruntime.framework/onnxruntime' for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Xcodeのビルド設定ではBitcode（？）が有効化されているが、使おうとしているONNX RuntimeパッケージにBitcodeが含まれていない、的なエラーのよう。

Xcode内の設定を探ると、確かにBitcodeが有効化されていそう。

ONNX Runtimeのビルドオプションにも、iOS向けのBitcode有効化オプションがある。

# Enable bitcode for iOS
option(onnxruntime_ENABLE_BITCODE "Enable bitcode for iOS only" OFF)

Bitcodeはプログラムをコンパイルする際の中間表現（intermediate representation）とのこと。Bitcodeを含むアプリをApple Storeにアップロードすると、Apple側でアプリのバイナリを最適化してくれるらしい。

これによって、新しいAppleデバイスが出たとしても、開発者がアプリを再度ビルドする必要なく、Apple側でビルドしてくれる。ただしアプリのBitcodeを有効化するには、依存ライブラリ全て（今回はONNX Runtime）がBitcodeを含んでいる必要がある。

https://www.sambaiz.net/article/286/

dhirooka

XcodeのビルドオプションからBitcodeを無効化する形でサンプルアプリをビルドしてみる。

Bitcodeを無効化

自前のiPhone XSを有線接続してXcodeからビルド、動いた。

dhirooka

Bitcodeを有効化してONNX Runtimeをビルド、その上でサンプルアプリをビルドしてみる。

ONNX Runtimeのビルド

ONNX Runtimeをクローンしてビルドする。

BitcodeのついでにCoreMLも有効化してみる。CoreMLはANE（Apple Neural Engine）上で効果的にML推論を行うためのライブラリ。NVIDIA GPUに対するCUDA的な立ち位置だと思う。

./build.sh --config RelWithDebInfo \
--use_xcode --ios --ios_sysroot iphoneos --osx_arch arm64 --apple_deploy_target 11.0 \
--cmake_extra_defines onnxruntime_ENABLE_BITCODE=ON --use_coreml

流れ的には、./build.sh → tools/ci_build/build.pyという順で呼ばれる。

dhirooka

iOS向けPodとしてのビルド

Python3.9系以上で、pip install flatbuffersが必要。内部で実行されるtools/ci_build/github/apple/build_ios_framework.pyへのオプションは-bオプションを通じて、-b='--config=RelWithDebInfo'などとして渡せる。

python tools/ci_build/github/apple/build_and_assemble_ios_pods.py \
--include-ops-by-config /path/to/required_operators.config \
--build-settings-file tools/ci_build/github/apple/default_mobile_ios_framework_build_settings.json \
-b='--config=RelWithDebInfo'

流れとしては

tools/ci_build/github/apple/build_and_assemble_ios_pod.py
- →各対象プラットフォーム、iOS-Arm64などごとにtools/ci_build/github/apple/build_ios_framework.py
  - →tools/ci_build/build.py
- →tools/ci_build/github/apple/c/assemble_c_pod_package.py, tools/ci_biuld/github/apple/objectivec/assemble_objc_pod_package.py

として実行され、iOS Podとしてbuild/ios_pod_staging/onnxruntime-mobile-c(objc)が生成される。

ビルド設定はJSONファイルで設定する。リポジトリで提供されているデフォルトは以下。各パラメータはtools/ci_build/build.pyの実行時オプションと対応している。

tools/ci_build/github/apple/default_mobile_ios_framework_build_settings.json

{
    "build_osx_archs": {
        "iphoneos": [
            "arm64"
        ],
        "iphonesimulator": [
            "arm64",
            "x86_64"
        ]
    },
    "build_params": [
        "--ios",
        "--parallel",
        "--use_xcode",
        "--build_apple_framework",
        "--minimal_build=extended",
        "--disable_rtti",
        "--disable_ml_ops",
        "--disable_exceptions",
        "--enable_reduced_operator_type_support",
        "--use_coreml",
        "--skip_tests",
        "--apple_deploy_target=11.0"
    ]
}

dhirooka

SwiftでのCoreML Execution Providerの有効化

var env = try ORTEnv(loggingLevel: ORTLoggingLevel.verbose)

let options = try ORTSessionOptions()
try options.setLogSeverityLevel(ORTLoggingLevel.verbose)
try options.setIntraOpNumThreads(threadCount)
            
let ep_options = ORTCoreMLExecutionProviderOptions()
// use CoreML but only for CPU
// ep_options.useCPUOnly = true
try options.appendCoreMLExecutionProvider(with: ep_options)
// Create the ORTSession
var session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: options)

参考

ORTObjectDetection - onnxruntime-inference-examples （Quantized MobileNet SSD）をベースに実行したときのログ。

CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 2 number of nodes in the graph: 169 number of nodes supported by CoreML: 4というログがある。

計算グラフのノードが169、CoreMLでサポートしているノードが4つ、CoreMLでサポートしているパーティションが2つ、ということで、ネットワーク全体の内CoreMLによって実行できる部分が少ない（ので今回はCoreMLによる高速化はあまりできていない）と思われる。

dhirooka

CoreML Execution Providerのサポート処理

ONNX RuntimeのCoreML EPがサポートしている処理は以下。

Conv2DやBatchNorm, MaxPool2Dなどはサポートしている。一方で下記のサンプルで使っているQuantized MobileNet SSDは、モデルの軽量化のための量子化を使っており、QLinearConvなどを使っているので、CoreML EPによる恩恵があまり受けられない模様。

ネットワークの構造はサンプルの指示に従ってダウンロードしたssd_mobilenet_v1.all.ortをNetronなどにアップロードして確認できる。

https://netron.app/