📖

MediaPipeのSelfie Segmentation

2022/07/30に公開

Google Meetの背景処理でも使われているらしいセグメンテーションを試してみました。
MediaPipe Selfie Segmentationを動かします。
手元のPCにはカメラがないので、アップロードした画像を処理するように少し変えます。
一番下を見ていただければ。

MediaPipe Selfie Segmentation

概要

MediaPipe Selfie Segmentationは、シーン内の目立つ人間をセグメント化します。スマートフォンとラップトップの両方でリアルタイムに実行することができます。想定しているユースケースは、人物がカメラに近い(2m未満)自撮りやビデオ会議などです。

Models

MediaPipe Selfie Segmentationは、generalとlandscapeの2つのモデルを提供します。両モデルともMobileNetV3をベースに、より効率的になるように改良を加えたものです。generalモデルは256x256x3(HWC)のテンソルを処理し、256x256x1のセグメンテーションマスクのテンソルを出力します。landscapeモデルは一般モデルと似ていますが、144x256x3 (HWC)のテンソルで動作します。generalモデルよりも負荷が少ないので、より高速に動作します。MediaPipe Selfie Segmentationは、モデルに入力する前に、入力画像を自動的にリサイズします。

generalモデルはML Kitにも使われていますし、landscapeモデルの改良版はGoogle Meetにも使われています。モデルの詳細についてはmodel cardをご確認ください。

ML Pipeline

The pipeline is implemented as a MediaPipe graph that uses a selfie segmentation subgraph from the selfie segmentation module.

Note: To visualize a graph, copy the graph and paste it into MediaPipe Visualizer. For more information on how to visualize its associated subgraphs, please see visualizer documentation.
(よくわからないので、原文ママ。。。）

Solution APIs

Cross-platform Configuration Options

Naming style and availability may differ slightly across platforms/languages.

MODEL_SELECTION

An integer index or . Use to select the general model, and to select the landscape model (see details in Models). Default to if not specified.01010

Output

Naming style may differ slightly across platforms/languages.
SEGMENTATION_MASK
The output segmentation mask, which has the same dimension as the input image.
(ここもよくわからないので、原文ママ。。。）

Python Solution API

まず、MediaPipe Pythonパッケージをインストールしてください。次に、Python Colabや以下の例で詳細をご確認ください。
Supported configuration options:
model_selection

import cv2
import mediapipe as mp
import numpy as np
mp_drawing = mp.solutions.drawing_utils
mp_selfie_segmentation = mp.solutions.selfie_segmentation

# For static images:
IMAGE_FILES = []
BG_COLOR = (192, 192, 192) # gray
MASK_COLOR = (255, 255, 255) # white
with mp_selfie_segmentation.SelfieSegmentation(
    model_selection=0) as selfie_segmentation:
  for idx, file in enumerate(IMAGE_FILES):
    image = cv2.imread(file)
    image_height, image_width, _ = image.shape
    # Convert the BGR image to RGB before processing.
    results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    # Draw selfie segmentation on the background image.
    # To improve segmentation around boundaries, consider applying a joint
    # bilateral filter to "results.segmentation_mask" with "image".
    condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
    # Generate solid color images for showing the output selfie segmentation mask.
    fg_image = np.zeros(image.shape, dtype=np.uint8)
    fg_image[:] = MASK_COLOR
    bg_image = np.zeros(image.shape, dtype=np.uint8)
    bg_image[:] = BG_COLOR
    output_image = np.where(condition, fg_image, bg_image)
    cv2.imwrite('/tmp/selfie_segmentation_output' + str(idx) + '.png', output_image)

# For webcam input:
BG_COLOR = (192, 192, 192) # gray
cap = cv2.VideoCapture(0)
with mp_selfie_segmentation.SelfieSegmentation(
    model_selection=1) as selfie_segmentation:
  bg_image = None
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.
      continue

    # Flip the image horizontally for a later selfie-view display, and convert
    # the BGR image to RGB.
    image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    results = selfie_segmentation.process(image)

    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

    # Draw selfie segmentation on the background image.
    # To improve segmentation around boundaries, consider applying a joint
    # bilateral filter to "results.segmentation_mask" with "image".
    condition = np.stack(
      (results.segmentation_mask,) * 3, axis=-1) > 0.1
    # The background can be customized.
    #   a) Load an image (with the same width and height of the input image) to
    #      be the background, e.g., bg_image = cv2.imread('/path/to/image/file')
    #   b) Blur the input image by applying image filtering, e.g.,
    #      bg_image = cv2.GaussianBlur(image,(55,55),0)
    if bg_image is None:
      bg_image = np.zeros(image.shape, dtype=np.uint8)
      bg_image[:] = BG_COLOR
    output_image = np.where(condition, image, bg_image)

    cv2.imshow('MediaPipe Selfie Segmentation', output_image)
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()

JavaScript Solution API

まず、JavaScriptによるMediaPipeのintroduction[https://google.github.io/mediapipe/getting_started/javascript.html]をよく読んで、次にweb demo[https://google.github.io/mediapipe/solutions/selfie_segmentation.html#resources]と以下の例で詳細をご確認ください。

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>
</head>

<body>
  <div class="container">
    <video class="input_video"></video>
    <canvas class="output_canvas" width="1280px" height="720px"></canvas>
  </div>
</body>
</html>

<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');

function onResults(results) {
  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
  canvasCtx.drawImage(results.segmentationMask, 0, 0,
                      canvasElement.width, canvasElement.height);

  // Only overwrite existing pixels.
  canvasCtx.globalCompositeOperation = 'source-in';
  canvasCtx.fillStyle = '#00FF00';
  canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);

  // Only overwrite missing pixels.
  canvasCtx.globalCompositeOperation = 'destination-atop';
  canvasCtx.drawImage(
      results.image, 0, 0, canvasElement.width, canvasElement.height);

  canvasCtx.restore();
}

const selfieSegmentation = new SelfieSegmentation({locateFile: (file) => {
  return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
}});
selfieSegmentation.setOptions({
  modelSelection: 1,
});
selfieSegmentation.onResults(onResults);

const camera = new Camera(videoElement, {
  onFrame: async () => {
    await selfieSegmentation.send({image: videoElement});
  },
  width: 1280,
  height: 720
});
camera.start();
</script>

Example Apps

Android、iOS、デスクトップ向けのMediaPipeのサンプルアプリについて、それぞれの手順をご確認ください。

Note: To visualize a graph, copy the graph and paste it into MediaPipe Visualizer. For more information on how to visualize its associated subgraphs, please see visualizer documentation.

Mobile
Graph: mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt
Android target: (or download prebuilt ARM64 APK) mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu:selfiesegmentationgpu
iOS target: mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp
Desktop
Please first see general instructions for desktop on how to build MediaPipe examples.

Running on CPU
Graph: mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt
Target: mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_cpu
Running on GPU
Graph: mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt
Target: mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_gpu
（よくわからないので原文ママ）

サンプルのソースがどちらもカメラの入力を処理するので、画像処理するようにした版を書きます。

python

ほぼPython Colabのままです。
フルHD画像のマスク処理で40msほどなので、ほぼリアルタイムでいけそうです。

mediapipをインストール

!pip install mediapipe

画像ファイルをアップロード

from google.colab import files
uploaded = files.upload()

アップロードされた画像を表示

import cv2
from google.colab.patches import cv2_imshow
import math
import numpy as np

DESIRED_HEIGHT = 480
DESIRED_WIDTH = 480
def resize_and_show(image):
  h, w = image.shape[:2]
  if h < w:
    img = cv2.resize(image, (DESIRED_WIDTH, math.floor(h/(w/DESIRED_WIDTH))))
  else:
    img = cv2.resize(image, (math.floor(w/(h/DESIRED_HEIGHT)), DESIRED_HEIGHT))
  cv2_imshow(img)

# Read images with OpenCV.
images = {name: cv2.imread(name) for name in uploaded.keys()}
# Preview the images.
for name, image in images.items():
  print(name)   
  resize_and_show(image)

画像は、APhotoより取得しました

selfie_segmentationを取得

import mediapipe as mp
mp_selfie_segmentation = mp.solutions.selfie_segmentation

selfie_segmentationでマスク処理

import time

# Show segmentation masks.
BG_COLOR = (192, 192, 192) # gray
MASK_COLOR = (255, 255, 255) # white

with mp_selfie_segmentation.SelfieSegmentation() as selfie_segmentation:
  for name, image in images.items():
    # Convert the BGR image to RGB and process it with MediaPipe Selfie Segmentation.
    start = time.time()
    results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    end = time.time()
    print((end - start)*1000)
    
    # Generate solid color images for showing the output selfie segmentation mask.
    fg_image = np.zeros(image.shape, dtype=np.uint8)
    fg_image[:] = MASK_COLOR
    bg_image = np.zeros(image.shape, dtype=np.uint8)
    bg_image[:] = BG_COLOR
    condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.2
    output_image = np.where(condition, fg_image, bg_image)

    print(f'Segmentation mask of {name}:')
    resize_and_show(output_image)

背景をぼかして合成

# Blur the image background based on the segementation mask.
with mp_selfie_segmentation.SelfieSegmentation() as selfie_segmentation:
  for name, image in images.items():
    # Convert the BGR image to RGB and process it with MediaPipe Selfie Segmentation.
    results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    blurred_image = cv2.GaussianBlur(image,(55,55),0)
    condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
    output_image = np.where(condition, image, blurred_image)
    
    print(f'Blurred background of {name}:')
    resize_and_show(output_image)
    #resize_and_show(image)

JavaScript

canvas周りが怪しいですが、動きました。
こちらも60ms～100msほどで処理できています。

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js"
        crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js"
        crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js"
        crossorigin="anonymous"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/stackblur-canvas/2.5.0/stackblur.min.js"></script>
    <script type="module">
        const imageElement = document.getElementsByClassName('input_image')[0];
        const canvasElement = document.getElementsByClassName('output_canvas')[0];
        const canvasCtx = canvasElement.getContext('2d');

        function onResults(results) {
            canvasCtx.save();

            // resize
            const { width, height } = results.image;
            const DESIRED_HEIGHT = 480;
            const DESIRED_WIDTH = 480;
            let rh, rw;
            if (height < width) {
                rw = DESIRED_WIDTH;
                rh = Math.floor(height / (width / DESIRED_WIDTH));
            } else {
                rw = Math.floor(width / (height / DESIRED_WIDTH));
                rh = DESIRED_HEIGHT;
            }
            canvasElement.width = rw;
            canvasElement.height = rh;

            // ぼかし
            canvasCtx.drawImage(results.image, 0, 0, rw, rh);
            const backData = canvasCtx.getImageData(0, 0, rw, rh);
            StackBlur.imageDataRGBA(backData, 0, 0, rw, rh, 5);

            createImageBitmap(backData).then(
                function success(backImage) {

                    // マスク部分をクリッピング
                    canvasCtx.globalCompositeOperation = 'destination-in';
                    canvasCtx.drawImage(results.segmentationMask, 0, 0, rw, rh);
                    const frontData = canvasCtx.getImageData(0, 0, rw, rh);

                    createImageBitmap(frontData).then(
                        function success(frontImage) {

                            // 合成
                            canvasCtx.globalCompositeOperation = 'source-over';
                            canvasCtx.drawImage(backImage, 0, 0, rw, rh);
                            canvasCtx.drawImage(frontImage, 0, 0, rw, rh);

                            canvasCtx.restore();
                        }
                    )
                }
            )
        }

        const selfieSegmentation = new SelfieSegmentation({
            locateFile: (file) => {
                return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
            }
        });
        selfieSegmentation.setOptions({
            modelSelection: 1,
        });
        selfieSegmentation.onResults(onResults);

        imageElement.onchange = async function (e) {
            const img = new Image;
            img.onload = async function () {
                var s_time = new Date();
                await selfieSegmentation.send({ image: img });
                var e_time = new Date();
                console.log('the image is drawn: ' + (e_time.getTime() - s_time.getTime()));
            }
            img.src = URL.createObjectURL(e.target.files[0]);
            e.target.value = '';
        }
    </script>
</head>

<body>
    <div class="container">
        <input type="file" class="input_image" />
        <canvas class="output_canvas"></canvas>
    </div>
</body>

</html>

ちなみに、PointRendで同じ画像を処理するとこのようになります。
いろいろと拾ってしまってますが、首回りとか手前のPC?とかがうまく処理できています。