🗳

ARKitのLiDAR Depth API

2021/12/08に公開

iOS

APIリファレンスやWWDCセッション、公式サンプル等の一次情報をベースにARKitのLiDARデプス関連APIの実装方法や精度についてまとめた^[1]。

シーンデプス（Scene Depth）

ARKit 3.5 ^[2] ではLiDARを利用してreconstractionした3Dメッシュは得られたが、その計算に用いられているはずのデプスデータにはアクセスできなかった。

そしてARKit 4 / iOS 14（/ iPadOS 14）から、そのLiDARで計測したデプス（深度）を取得できるようになった。従来のデプスと区別するため、「シーンデプス（Scene Depth）」とも呼ばれている。

取得方法

LiDAR由来のデプスデータは、ARWorldTrackingConfiguration 利用時、frameSemantics プロパティに .sceneDepth オプションを指定することで取得可能になる。

let session = ARSession()
let configuration = ARWorldTrackingConfiguration()

// sceneDepthオプションが利用可能かをチェック
if type(of: configuration).supportsFrameSemantics(.sceneDepth) {
   // Activate sceneDepth
   configuration.frameSemantics = .sceneDepth
}
session.run(configuration)

ARFrame にARKit 4で追加された sceneDepth プロパティより、LiDARで計測したデプスデータを取得できる。

func session(_ session: ARSession, didUpdate frame: ARFrame) {
   guard let depthData = frame.sceneDepth else { return }
   // Use depth data
}

利用可能なコンフィギュレーション

ARConfiguration プロトコルに定義されている frameSemantics プロパティに sceneDepth を設定するので、API的には利用可能ではあるが、APIリファレンスの sceneDepth の項にある記述を読む限り、ワールドトラッキング以外のコンフィギュレーションでは利用できないっぽい。

If you enable this option on a world-tracking configuration's frameSemantics, ARKit includes depth information for regions of the current frame's capturedImage. The world-tracking configuration exposes depth information in the sceneDepth property that it updates every frame. （https://developer.apple.com/documentation/arkit/arconfiguration/framesemantics/3516902-scenedepth より）

ハードウェア的な制約

同じく sceneDepth のリファレンスページより。

ARKit supports scene depth only on LiDAR-capable devices

チップがA12以上とかの条件はないようだ。（LiDARが載る→必然的に高性能チップとなるからだろうか）

従来のデプスデータとの違い

これまでも（特定の条件下で）ARKit利用時にデプスデータにアクセスすることは可能だった。ARFrame の capturedDepthData プロパティ、estimatedDepth が従来からあったデプスAPI。今回追加されたデプスAPIと何が違うのか。

デプスデータの種類と取得条件

まずもちろん、デプスデータの種類が違う。そしてそれに伴い、取得できる条件も違う。capturedDepthData はTrue-Depthカメラ由来のデプスデータで、フェイストラッキング利用時のみ取得できる。

estimatedDepth は frameSemantics に personSegmentationWithDepth オプションを指定した場合に取得できるもので、デュアルカメラやTrue-Depthカメラ由来のデプスではなく、そのプロパティ名からわかるとおり推定デプスデータである。要A12チップ以上。

sceneDepth は上述した通り、frameSemantics に sceneDepth を指定した場合に取得でき、ARコンフィギュレーションとしてはワールドトラッキングのみ、デバイスとしてLiDARを搭載している必要がある。

フレームレート／生成アルゴリズムの違い

capturedDepthData はTrue-Depthカメラ由来のデプスデータであるため、ARFrame のカラーデータ（capturedImage プロパティ）よりも更新頻度が低く、デプスを用いたエフェクトなどをかけていると、被写体（フェイストラッキングなので自分）が速く動いた場合に追従できないことがあった。

一方で estimatedDepth は機械学習ベースで推定されるもので、A12以降の高性能なチップを条件としているため、カメラフレームと同等のフレームレートで更新される。

sceneDepth もLiDARで取得した深度データと広角カメラから取得したカラーデータをベースに機械学習アルゴリズムを利用して生成される。60fpsで動作する、つまりこちらも ARFrame が得られるたびに更新されている。

The colored RGB image from the wide-angle camera and the depth ratings from the LiDAR scanner are fused together using advanced machine learning algorithms to create a dense depth map that is exposed through the API.
（広角カメラからのカラーRGB画像とLiDARスキャナからの深度定格が、高度な機械学習アルゴリズムを使用して融合され、APIを通じて公開される高密度の深度マップが作成されます。）

This operation runs at 60 times per second with the depth map available on every AR frame.
（この操作は毎秒60回実行され、ARフレームごとに深度マップが利用可能になります。）

（WWDC 2020の"Explore ARKit 4"セッションより）

型の違い

capturedDepthData は AVDepthData 型、estimatedDepth は CVPixelBuffer 型、そして sceneDepth は ARDepthData 型で得られる。

ARDepthData

上述したとおり sceneDepth プロパティから得られるLiDAR由来のデプスデータは ARDepthData 型で得られる。こちらはiOS 14で追加されたクラス。

これをうまく利用した公式サンプルが出ているので、詳しい解説はそのサンプルを読みつつ別記事で書く。

personSegmentationWithDepthとシーンデプス

ARConfigurationのframeSemantics プロパティに personSegmentationWithDepth オプションを指定している場合、シーンデプスを取得可能なデバイスであれば、自動的にシーンデプスを得られるらしい。

let session = ARSession()
let configuration = ARWorldTrackingConfiguration()

// Set required frame semantics
let semantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth
       
// Check if configuration and device supports the required semantics
if type(of: configuration).supportsFrameSemantics(semantics) {
   // Activate .personSegmentationWithDepth
   configuration.frameSemantics = semantics
}
session.run(configuration)

しかも、追加の電力コストもかからないとのこと。

Additionally if you have an AR app that uses people occlusion feature, and then search the personSegmentationWithDepth frameSemantic, then you will automatically get sceneDepth on devices that support the sceneDepth frameSemantic with no additional power cost to your application.

smoothedSceneDepth

iOS 14.0 beta 5で smoothedSceneDepth なるAPIが突如追加された。

このとき（2020年8月）ドキュメントをすぐに見に行ったが詳細は何も書かれてなかった。

今見に行くとちゃんと説明が書かれており、sceneDepthとの違いも明記されている。DeepL利用の日本語訳を併記してここにまとめておく。またsmoothedSceneDepth の点群を可視化するサンプルを末尾に添付する。

smoothedSceneDepth: ARConfiguration.FrameSemantics

ARConfiguration.FrameSemantics の型プロパティ

An option that provides the distance from the device to real-world objects, averaged across several frames.
　（デバイスから実世界のオブジェクトまでの距離を提供するオプションで、複数のフレームで平均化されます。）

Declaration

static var smoothedSceneDepth: ARConfiguration.FrameSemantics { get }

Discussion

Enable this option on a world-tracking configuration (ARWorldTrackingConfiguration) to instruct ARKit to provide your app with the distance between the user’s device and the real-world objects pictured in the frame's capturedImage. ARKit samples this distance using the LiDAR scanner and provides the results through the smoothedSceneDepth property on the session’s currentFrame.
（ワールドトラッキング設定（ARWorldTrackingConfiguration）でこのオプションを有効にすると、ユーザーのデバイスとフレームのキャプチャ画像に写っている現実世界のオブジェクトとの距離をアプリに提供するようにARKitに指示します。ARKit は、LiDAR スキャナーを使用してこの距離をサンプリングし、セッションの currentFrame の smoothedSceneDepth プロパティを通して結果を提供します。）

LiDARスキャナーを使用することがここで明記されている。

To minimize the difference in LiDAR readings across frames, ARKit processes the data as an average. The averaged readings reduce flickering to create a smoother motion effect when depicting objects with depth, as demonstrated in Creating a Fog Effect Using Scene Depth. Alternatively, to access a discrete LiDAR reading at the instant the framework creates the current frame, use sceneDepth.
（**フレーム間の LiDAR 読み取り値の差を最小化するために、ARKit はデータを平均として処理します。平均化された読み取り値は、「シーン深度を使用したフォグ効果の作成」で実証されているように、奥行きのあるオブジェクトを描写する際にフリッカーを低減し、より滑らかなモーション効果を生み出します。**また、フレームワークが現在のフレームを作成した瞬間に個別の LiDAR 読み値にアクセスするには、sceneDepth を使用します。）

ここが一番重要。sceneDepthとの違いが明記されており、メリットや使い分けについても書かれている。

ARKit supports scene depth only on LiDAR-capable devices, so call supportsFrameSemantics(:) to ensure device support before attempting to enable scene depth.
（ARKit は、LiDAR 対応デバイスでのみシーン深度をサポートしているので、シーン深度を有効にする前に supportsFrameSemantics(:) を呼び出してデバイスのサポートを確認してください。）

smoothedSceneDepth: ARDepthData

ARFrame の smoothedSceneDepth プロパティ

An average of distance measurements between a device's rear camera and real-world objects that creates smoother visuals in an AR experience.
（デバイスの背面カメラと実世界のオブジェクトの間の距離測定の平均値で、AR体験でよりスムーズなビジュアルを作成します。）

Declaration

var smoothedSceneDepth: ARDepthData? { get }

Discussion

This property describes the distance between a device's camera and objects or areas in the real world, including ARKit’s confidence in the estimated distance. This is similar to sceneDepth except that the framework smoothes the depth data over time to lessen its frame-to-frame delta.
（このプロパティは、デバイスのカメラと実世界のオブジェクトやエリアとの距離を記述します。これは sceneDepth と似ていますが、フレームワークがフレーム間の差分を減らすために深度データを時間の経過とともに滑らかにします。）

This property is nil by default. Add the smoothedSceneDepth frame semantic to your configuration’s frameSemantics to instruct the framework to populate this value with ARDepthData captured by the LiDAR scanner.
（このプロパティはデフォルトでは nil です。フレームワークに、LiDAR スキャナによってキャプチャされた ARDepthData でこの値を入力するように指示するために、設定の frameSemantics に smoothedSceneDepth フレームセマンティックを追加します。）

Call supportsFrameSemantics(:) on your app’s configuration to support smoothed scene depth on select devices and configurations.
（アプリの設定で supportsFrameSemantics(:) を呼び出して、選択したデバイスと設定で平滑化されたシーン深度をサポートします。）

smoothedSceneDepthの点群を可視化するサンプルコード

smoothedSceneDepth を利用した公式サンプルが出ている。

smoothedSceneDepth を利用して霧のようなエフェクトをかけるサンプルとなっている。

しかしこういう応用例よりも、このツイートのように普通にデプスデータの点群（Point Cloud）を可視化して見てみたい。

実はこれ、Appleの sceneDepth 利用サンプルを2行だけ変更したものらしい。

Video is Apple's sample app with a 2 line change to enable smoothing.

元となる点群表示サンプルはこちら：

実際には4箇所修正した。^[3]

LiDARの精度

WWDC 2020ではないが、ARKit 3.5リリース時に、"Advanced Scene Understanding in AR"というTech Talkが公開されており、その中でLiDARの精度について言及があった。

The new iPad Pro comes equipped with a LiDAR Scanner. This is used to determine distance by measuring at nanosecond speeds how long it takes for light to reach an object in front of you and reflect back. This is effective up to five meters away and operates both indoors and outdoors.
（新しいiPad ProにはLiDARスキャナーが搭載されています。これは、目の前の物体に光が到達して反射して戻ってくるまでの時間をナノ秒の速度で測定することで距離を判断するために使用されます。これは5メートル先まで有効で、屋内でも屋外でも使用できます。）

LiDARで何メートル先まで計測できるのか、という質問は何度かされたが、こういうのは公式には言及されない印象だったので、「サンプルで実際に試してみるといいですよ」という回答をしていた。目安にしろ貴重な公式情報ではあるのでメモ。

Scene Reconstruction

本記事はARKitのLiDAR「デプス」関連のまとめということで、デプスデータを直接扱うわけではないScene Reconstruction関連の話題は割愛した。また別記事にまとめたいと思う。

これまでに断片的に書いた記事がいくつかあるので、リンクを載せておく。

https://note.com/shu223/n/n66f1ad448b75

https://note.com/shu223/n/n3265201bc01d

https://note.com/shu223/n/n44a60ed9828d

https://note.com/shu223/n/n72280dfb0ab8

この記事は iOS Advent Calendar 2021 9日目の記事です。

脚注

本記事は2020年6月に書いたこちらの記事と、同年12月に書いた[こちらの記事をベースに加筆修正したものです。内容が一部古い場合があります。 ↩︎
初のLiDAR搭載iPad Proの発売に合わせてリリースされたARKitバージョン。 ↩︎
修正はまったく難しくありませんが、もし修正済みサンプルが欲しい方はこちらをご検討ください。 ↩︎

Discussion

ログインするとコメントできます