Closed6

MoViNets Video Classification on TensorFlow Lite for Unity

https://www.tensorflow.org/lite/examples/video_classification/overview

https://github.com/tensorflow/models/tree/master/official/projects/movinet

面白そうなので移植してみる。

日本語は以下がわかりやすい説明かも

https://deideeplearning.com/2021/07/09/post-549/

3次元(2次元画像 + 時間軸)の入力を持つモデルだが、
Streamなモデルは44個のinput/outputがある。前フレームの出力を次フレームに引き継ぐことで高速化してるようだ。

input 0: type: Int32, dimensions: [1]
input 1: type: Float32, dimensions: [1,2,22,22,80]
input 2: type: Float32, dimensions: [1,2,22,22,80]
input 3: type: Int32, dimensions: [1]
input 4: type: Float32, dimensions: [1,1,1,1,184]
input 5: type: Int32, dimensions: [1]
input 6: type: Float32, dimensions: [1,1,1,1,384]
input 7: type: Int32, dimensions: [1]
input 8: type: Float32, dimensions: [1,4,11,11,184]
input 9: type: Float32, dimensions: [1,2,11,11,184]
input 10: type: Float32, dimensions: [1,1,1,1,112]
input 11: type: Float32, dimensions: [1,2,11,11,112]
input 12: type: Float32, dimensions: [1,2,11,11,184]
input 13: type: Int32, dimensions: [1]
input 14: type: Float32, dimensions: [1,2,11,11,184]
input 15: type: Float32, dimensions: [1,1,1,1,80]
input 16: type: Int32, dimensions: [1]
input 17: type: Float32, dimensions: [1,1,1,1,24]
input 18: type: Int32, dimensions: [1]
input 19: type: Int32, dimensions: [1]
input 20: type: Float32, dimensions: [1,1,1,1,184]
input 21: type: Int32, dimensions: [1]
input 22: type: Int32, dimensions: [1]
input 23: type: Int32, dimensions: [1]
input 24: type: Int32, dimensions: [1]
input 25: type: Float32, dimensions: [1,2,11,11,184]
input 26: type: Float32, dimensions: [1,4,6,6,384]
input 27: type: Float32, dimensions: [1,2,22,22,80]
input 28: type: Float32, dimensions: [1,1,1,1,184]
input 29: type: Float32, dimensions: [1,1,1,1,344]
input 30: type: Float32, dimensions: [1,1,1,1,80]
input 31: type: Int32, dimensions: [1]
input 32: type: Float32, dimensions: [1,4,11,11,184]
input 33: type: Float32, dimensions: [1,1,1,1,184]
input 34: type: Float32, dimensions: [1,1,1,1,480]
input 35: type: Float32, dimensions: [1,1,1,1,280]
input 36: type: Int32, dimensions: [1]
input 37: type: Float32, dimensions: [1,1,172,172,3]
input 38: type: Float32, dimensions: [1,1,1,1,80]
input 39: type: Float32, dimensions: [1,1,1,1,184]
input 40: type: Int32, dimensions: [1]
input 41: type: Float32, dimensions: [1,1,1,1,280]
input 42: type: Float32, dimensions: [1,1,1,1,184]
input 43: type: Int32, dimensions: [1]

//----------------

output 0: type: Int32, dimensions: [1]
output 1: type: Float32, dimensions: [1,2,22,22,80]
output 2: type: Float32, dimensions: [1,2,22,22,80]
output 3: type: Int32, dimensions: [1]
output 4: type: Float32, dimensions: [1,1,1,1,184]
output 5: type: Int32, dimensions: [1]
output 6: type: Float32, dimensions: [1,1,1,1,384]
output 7: type: Int32, dimensions: [1]
output 8: type: Float32, dimensions: [1,4,11,11,184]
output 9: type: Float32, dimensions: [1,2,11,11,184]
output 10: type: Float32, dimensions: [1,600]
output 11: type: Float32, dimensions: [1,1,1,1,112]
output 12: type: Float32, dimensions: [1,2,11,11,112]
output 13: type: Float32, dimensions: [1,2,11,11,184]
output 14: type: Int32, dimensions: [1]
output 15: type: Float32, dimensions: [1,2,11,11,184]
output 16: type: Float32, dimensions: [1,1,1,1,80]
output 17: type: Int32, dimensions: [1]
output 18: type: Float32, dimensions: [1,1,1,1,24]
output 19: type: Int32, dimensions: [1]
output 20: type: Int32, dimensions: [1]
output 21: type: Float32, dimensions: [1,1,1,1,184]
output 22: type: Int32, dimensions: [1]
output 23: type: Int32, dimensions: [1]
output 24: type: Int32, dimensions: [1]
output 25: type: Int32, dimensions: [1]
output 26: type: Float32, dimensions: [1,2,11,11,184]
output 27: type: Float32, dimensions: [1,4,6,6,384]
output 28: type: Float32, dimensions: [1,2,22,22,80]
output 29: type: Float32, dimensions: [1,1,1,1,184]
output 30: type: Float32, dimensions: [1,1,1,1,344]
output 31: type: Float32, dimensions: [1,1,1,1,80]
output 32: type: Int32, dimensions: [1]
output 33: type: Float32, dimensions: [1,4,11,11,184]
output 34: type: Float32, dimensions: [1,1,1,1,184]
output 35: type: Float32, dimensions: [1,1,1,1,480]
output 36: type: Float32, dimensions: [1,1,1,1,280]
output 37: type: Int32, dimensions: [1]
output 38: type: Float32, dimensions: [1,1,1,1,80]
output 39: type: Float32, dimensions: [1,1,1,1,184]
output 40: type: Int32, dimensions: [1]
output 41: type: Float32, dimensions: [1,1,1,1,280]
output 42: type: Float32, dimensions: [1,1,1,1,184]
output 43: type: Int32, dimensions: [1]

う、どうやら Signaturesという新しい概念が必要になって大変そうだぞ…。

https://www.tensorflow.org/lite/guide/signatures

Androidのサンプルコードがあるので、TensorFlow Liteのc_api_exprimental.hに追加されたTfLiteSignature*系のAPIをC#実装すれば対応できそうだってことまでは確認した。

https://github.com/tensorflow/examples/blob/7a7cf7e04752eb2c4b96bdbd97f40d18bec7b327/lite/examples/video_classification/android/app/src/main/java/org/tensorflow/lite/examples/videoclassification/ml/VideoClassifier.kt

Signaturesなんとなくわかってきた。一つのモデルで複数のエントリーポイントを持てるってのが肝っぽい。

…しかし提供されてるMoViNetsモデルについては、一個のエントリーポイントしかないし。
これなくてもモデルのinput/outputにきちんと名前をつけてマッピングすれば必要ない概念な気がするんだけどどうなんだろう。サンプルでやられてるみたいに、input/outputに重複するキーを持ちたいってことなのかな?

実装できたぽいので、デモ動画を取るために、以前作ったライブラリをUPM経由で入れようとしたらGUIDがコンフリクトしてるから入れられなった。パッケージをコピーするときはGUIDを作り直すために、.metaはコピーしちゃいけない…。

https://github.com/asus4/TextureSource
このスクラップは2ヶ月前にクローズされました
ログインするとコメントできます