🌊
AI Kit を使った Hailo Raspberry Pi 5 Examples を読み解く

tomoichi hayashi @ araya
2024/12/09に公開
 この記事についてこの記事は、Raspberry Pi 5 ＋ AI Kit の導入に関する話題をまとめています。

AI Kitとは、Rapsberry Pi 5に高速・低消費電力なAI処理機能を追加するキットです。
初回はRaspberry Pi財団が公開するソフトウェアを用いてAI Kitを動作させるところまでを説明しました。

第2回はHailoRT Python packageを使用して推論を実行する方法を紹介しました。

第3回はHailo社の公開ソフトウェア、Hailo Raspberry Pi 5 ExamplesをインストールしてTAPPASを使用しました。

第4回はHailo Raspberry Pi 5 Examplesを読み解いて、一部改造をしていきます。
今回の内容にはHailo社のアカウントの作成は必要ありません。

 誰が書いているのか株式会社アラヤのエッジAI チームの林です。

普段は、エッジAIシステムの開発などを行っています!
アラヤのエッジAIチームでは、アラヤの"人類の未来を圧倒的に面白く!"するというビジョンを達成するために、"すべてのモノにAIを乗せる"というミッションに取り組んでいます。

 Hailo Raspberry Pi 5 Examples (Basic Pipelines)TAPPASをインストールし、PythonからTAPPASを実行してコールバック関数で推論結果を受け取るサンプルです。

 インストール初回のセットアップの済んだ所から開始してください。

第3回のインストールがお済の方はこの節は飛ばして、サンプルコマンド(姿勢推定)実行に進んでください。
インストールはBasic Pipelines Installation Guideに従います。
git clone https://github.com/hailo-ai/hailo-rpi5-examples.git
cd hailo-rpi5-examples
source setup_env.sh
pip install -r requirements.txt
./download_resources.sh
./compile_postprocess.sh
インストールは上記コマンドの実行で完了です。

今後は実行のたびに、source setup_env.shを実行する必要があります。

 サンプルコマンド(姿勢推定)実行
 引数なしで実行Pose Estimation Exampleの実行例です
python basic_pipelines/pose_estimation.py
コールバック内で生成されたconfidenceと目の座標がコンソールに出力されています。
上の例では入力は動画ファイルですが、カメラを指定することも可能です。

（RPi camera は Still in Betaですが…）
  --input INPUT, -i INPUT
                        Input source. Can be a file, USB or RPi camera (CSI camera module). For RPi camera use '-i rpi' (Still in Beta). Defaults to example
                        video resources/detection0.mp4

 簡単に解説pose_estimation.pyを実行すると、1フレームを処理する毎に下に引用したpose_estimation.pyに記述されたユーザ定義のコールバック関数app_callbackが呼び出されます。

コールバック関数では、引数で与えられたinfoから、30, 49, 50行目により検出したオブジェクトの情報detectionsが得られます。

56行目からのループでは検出したオブジェクト毎に

ラベルがpersonであれば右目と左目の位置を取得(63, 65, 68行目)して画像内の座標に変換(69, 70行目)します。
37行目で1フレーム毎にフレーム番号をコンソール出力して、61行目で検出したオブジェクト毎にconfidenceを、71行目で目の座標をコンソールに出力します。

 use-frameを指定して実行次のオプションを指定すると、コールバック関数内でフレームを使用します。
  --use-frame, -u       Use frame from the callback function
次のコマンドを実行します。
python basic_pipelines/pose_estimation.py -u
コールバック内で生成されたUser Frameウィンドウが表示されるようになりました。

 簡単に解説use-frameオプションを指定するとuser_data.use_frameがTrueとなります。

コールバック関数内の46行目でフレーム画像を取得して、73行目で目の位置に点を描きます。

78行目でUser Frameウィンドウを表示します。

 コールバック関数pose_estimation.pyに記述されたユーザ定義のコールバック関数app_callbackを引用します。
23 # -----------------------------------------------------------------------------------------------
24 # User-defined callback function
25 # -----------------------------------------------------------------------------------------------
26
27 # This is the callback function that will be called when data is available from the pipeline
28 def app_callback(pad, info, user_data):
29     # Get the GstBuffer from the probe info
30     buffer = info.get_buffer()
31     # Check if the buffer is valid
32     if buffer is None:
33         return Gst.PadProbeReturn.OK
34
35     # Using the user_data to count the number of frames
36     user_data.increment()
37     string_to_print = f"Frame count: {user_data.get_count()}\n"
38
39     # Get the caps from the pad
40     format, width, height = get_caps_from_pad(pad)
41
42     # If the user_data.use_frame is set to True, we can get the video frame from the buffer
43     frame = None
44     if user_data.use_frame and format is not None and width is not None and height is not None:
45         # Get video frame
46         frame = get_numpy_from_buffer(buffer, format, width, height)
47
48     # Get the detections from the buffer
49     roi = hailo.get_roi_from_buffer(buffer)
50     detections = roi.get_objects_typed(hailo.HAILO_DETECTION)
51
52     # Get the keypoints
53     keypoints = get_keypoints()
54
55     # Parse the detections
56     for detection in detections:
57         label = detection.get_label()
58         bbox = detection.get_bbox()
59         confidence = detection.get_confidence()
60         if label == "person":
61             string_to_print += (f"Detection: {label} {confidence:.2f}\n")
62             # Pose estimation landmarks from detection (if available)
63             landmarks = detection.get_objects_typed(hailo.HAILO_LANDMARKS)
64             if len(landmarks) != 0:
65                 points = landmarks[0].get_points()
66                 for eye in ['left_eye', 'right_eye']:
67                     keypoint_index = keypoints[eye]
68                     point = points[keypoint_index]
69                     x = int((point.x() * bbox.width() + bbox.xmin()) * width)
70                     y = int((point.y() * bbox.height() + bbox.ymin()) * height)
71                     string_to_print += f"{eye}: x: {x:.2f} y: {y:.2f}\n"
72                     if user_data.use_frame:
73                         cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)
74
75     if user_data.use_frame:
76         # Convert the frame to BGR
77         frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
78         user_data.set_frame(frame)
79
80     print(string_to_print)
81     return Gst.PadProbeReturn.OK

 コールバック関数を改造コールバック関数を改造して、User Frameウィンドウに両ひざ下を線で結んだ画像を表示します。
前記に引用した、pose_estimation.pyに記述されたユーザ定義のコールバック関数の64~73行目を次のように変更します。

変更前の例ではキーポイントIDを文字列（'left_eye', 'right_eye'）から決定していますが、変更後はキーポイントIDを既知のものとして扱っています。

また、膝のID+2が足首のIDであることも利用しています。
64             if len(landmarks) != 0:
65                 points = landmarks[0].get_points()
66                 #'left_knee': 13,
67                 #'right_knee': 14,
68                 #'left_ankle': 15,
69                 #'right_ankle': 16,
70                 for id in [13,14]:
71                     point=[0,0]
72                     for i in range(2): #i= 0:knee / 1:ankle
73                         point[i] = points[id+i*2]
74                         x = int((point[i].x() * bbox.width() + bbox.xmin()) * width)
75                         y = int((point[i].y() * bbox.height() + bbox.ymin()) * height)
76                         point[i] = (x, y)
77                     if user_data.use_frame:
78                         cv2.circle(frame, point[0], 5, (0, 255, 0), -1)
79                         cv2.circle(frame, point[1], 5, (0, 0, 255), -1)
80                         cv2.line(frame, point[0], point[1], (255, 0, 0), 3)
上記の変更をして、次のコマンドを実行します。
python basic_pipelines/pose_estimation.py -u

 おまけ
 fakevideosinkを使ってUser Frameウィンドウだけが表示されるようにするGstreamerにfakevideosinkを使うことで、もともと表示されていたウィンドウを非表示にしてUser Frameウィンドウだけが表示されるようにします。

pose_estimation_pipeline.pyのdisplay_pipeline定義部分を次のように変更します。
-        display_pipeline = DISPLAY_PIPELINE(video_sink=self.video_sink, sync=self.sync, show_fps=self.show_fps)
+        display_pipeline = DISPLAY_PIPELINE(video_sink='fakevideosink', sync=self.sync, show_fps=self.show_fps)

 鏡で見ているような左右反転画像を表示するカメラ入力では、鏡で見ているような左右反転画像を見たい時もありますよね。

Gstreamerで'videoflip video-direction=horizを使うことで、表示画像を左右反転することが出来ます。

pose_estimation_pipeline.pyのpipeline_string定義部分を次のように変更します。
         pipeline_string = (
             f'{source_pipeline} '
+            f'videoflip video-direction=horiz ! '
             f'{infer_pipeline} ! '

 その他スクリーンショットをとるためにCLIコマンドの grim コマンドを使いました。

AI Hat+ が発売されました。AI Kitとは違い、Hailo-8とHailo-8Lを選ぶことが出来ます。

 まとめRaspberry Pi 5 ＋ AI Kit の導入に関する話題をまとめています。

今回はHailo Raspberry Pi 5 Examplesのサンプルを改造して、独自のオーバーレイ画像を生成する方法を試しました。
エッジAIからLLM構築まで気軽にご相談ください!

株式会社アラヤの先端研究支援

株式会社アラヤのエッジAI

SubnetX