Open2024/05/07にコメント追加5

RunPodでStyle-Bert-VITS2をServerlessで動かす

Python

kazuph

https://www.runpod.io/

このクラウドGPUのRunPodでStyle-Bert-VITS2を動かしたときのメモです。

ファイル追加

まずハンドラーと起動の起点になるShellスクリプトを追加します。

runpod_handler.py

import time

import runpod
import base64
import requests
from requests.adapters import HTTPAdapter, Retry

API_URL = "http://127.0.0.1:8888"


automatic_session = requests.Session()
retries = Retry(total=10, backoff_factor=0.1, status_forcelist=[502, 503, 504])
automatic_session.mount("http://", HTTPAdapter(max_retries=retries))


def wait_for_service(url):
    while True:
        try:
            requests.get(url)
            return
        except requests.exceptions.RequestException:
            print("Service not ready yet. Retrying...")
        except Exception as err:
            print("Error: ", err)

        time.sleep(0.2)


def inference(params):
    # params={
    #     "model_id": model_id,
    #     "encoding": "utf-8",
    #     "text": text,
    # },
    response = automatic_session.get(
        url=f"{API_URL}/voice",
        params=params,
        timeout=60,
    )

    print(response.status_code)
    audio_wav = response.content

    # wav音源をbase64に変換する
    audio_base64 = base64.b64encode(audio_wav).decode("utf-8")

    return {"voice": audio_base64}


def models_info():
    response = automatic_session.get(
        url=f"{API_URL}/models/info",
        timeout=60,
    )
    return response.json()


def handler(event):
    print("Event: ", event)
    params = event["input"]
    action = params["action"]

    if action == "/voice":
        print("Inference API is called.")
        return inference(params)
    elif action == "/models/info":
        print("Models info API is called.")
        return models_info()
    else:
        return {"error": "Unknown action"}


if __name__ == "__main__":
    wait_for_service(url=f"{API_URL}/models/info")

    print("API Service is ready. Starting RunPod...")

    runpod.serverless.start({"handler": handler})

start.sh

#!/bin/bash

echo "Worker Initiated"

echo "Starting SD API Server For PROD 🚀"
python server_fastapi.py &

echo "Starting RunPod Handler 🏃‍♂💨"

# localでテストするときはこちらにする
# python -u ./runpod_handler.py --rp_serve_api --rp_api_host='0.0.0.0'
python -u ./runpod_handler.py

これによって、

Docker→CMD→start.sh→server_fastapiをバックグラウンドで起動→runpod_handler起動

という感じになります。

Dockerfile

Runpod用のDockerfileです。<YOUR_MODEL＿NAME>の部分を任意のモデル名にしてください。もちろん model_assets 以下をすべてCOPYしてもいいですが、その場合はコンテナサイズに注意です。

Dockerfile.runpod

FROM pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime
ENV TZ=Asia/Tokyo
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt update && apt install -y build-essential libssl-dev libffi-dev cmake git wget ffmpeg nvidia-cuda-toolkit libatlas-base-dev gfortran


WORKDIR /app
RUN pip3 uninstall -y cmake

COPY requirements.txt .
RUN pip3 install -r requirements.txt
RUN pip3 install runpod

# COPY . /app
ENV LD_LIBRARY_PATH /opt/conda/lib/python3.10/site-packages/nvidia/cublas/lib:/opt/conda/lib/python3.10/site-packages/nvidia/cudnn/lib:${LD_LIBRARY_PATH}

COPY bert bert
COPY configs configs
COPY common common
COPY text text
COPY monotonic_align monotonic_align

RUN mkdir -p model_assets/<YOUR_MODEL＿NAME>
COPY model_assets/<YOUR_MODEL＿NAME> model_assets/<YOUR_MODEL_NAME>

COPY *.yml .
COPY *.py .
COPY server_fastapi.py .
COPY start.sh .
# ENTRYPOINT [ "python" ]
CMD [ "./start.sh" ]

デプロイ（のためのコンテナpush）

deploy.sh

#/bin/bash
USER="your docker hub account name" # 修正
APP_NAME="runpod-style-bert-vits2-api"
VERSION=$1

# VERSIONを目視で確認するのでy/Nで確認
echo "バージョンは$VERSIONでよろしいですか？"
read -p "y/N: " yn
case "$yn" in [yY]*) ;; *) echo "中止します" ; exit ;; esac

# git tag -a $VERSION -m "$VERSION"

# buildコマンド
sudo DOCKER_BUILDKIT=1 docker build --progress=plain . -f Dockerfile.runpod -t $USER/$APP_NAME:$VERSION

# pushコマンド
sudo docker push $USER/$APP_NAME:$VERSION

を用意して、以下のように実行します。

chmod +x deploy.sh
./deploy.sh <VERSION> # ex) 1.0.0

これでpushまで終わります。

あとはRunPodの管理画面からTemplateを作成し、お好みのGPUを選びデプロイしてください。

kazuph

テスト用スクリプト

ローカル

runpod_test.sh

BASE64=$(curl -s -X POST  -H "Content-Type: application/json" -d '{"input": {"action":"/voice","model_id": 0, "text": "はーい、こんにちわ"}}' http://0.0.0.0:8000/runsync | jq -r .output.voice)

echo -n "$BASE64" | base64 -d > runpod_test_voice.wav

リモート

runpod_test_remote.sh


BASE64=$(curl -s -X POST  \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR_TOKEN>" \
  -d '{"input": {"action":"/voice","model_id": 0, "text": "はーい、みなさーん、こちららんぽっどのAPI経由での音声です！"}}' \
  https://api.runpod.ai/v2/<YOUR_PATH>/runsync | tee req.json | jq -r .output.voice)

echo -n "$BASE64" | base64 -d > runpod_test_voice.wav

kazuph

実行時間の検証

runpod_test_remote.sh

BASE64=$(curl -s -X POST  \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <YOUR TOKEN>" \
  -d '{"input": {"action":"/voice","model_id": 0, "text": "はーい、みなさーん、こちら欄ポッドのAPI経由での音声です！"}}' \
  https://api.runpod.ai/v2/<YOUR PATH>/runsync | tee res.json | jq -r .output.voice)

cat res.json | jq "." | grep -v voice

echo -n "$BASE64" | base64 -d > runpod_test_voice.wav

responseのJSONを保存して実行時もtimeをつけてみます。

# コンテナ起動前
$ time ./runpod_test_remote.sh 
{
  "delayTime": 17556,
  "executionTime": 4200,
  "id": "sync-63f1fc75-fae3-42d5-bdc9-58ea27c626ae-e1",
  "output": {
  },
  "status": "COMPLETED"
}
./runpod_test_remote.sh  0.12s user 0.03s system 0% cpu 23.825 total

# コンテナ起動後
$ time ./runpod_test_remote.sh 
{
  "delayTime": 95,
  "executionTime": 386,
  "id": "sync-98022d72-9231-47aa-9fb1-c492dbc04c3f-e1",
  "output": {
  },
  "status": "COMPLETED"
}
./runpod_test_remote.sh  0.10s user 0.04s system 4% cpu 2.801 total

って感じでコンテナ起動前、正確には「コンテナ内の音声合成モデルのロードが終わりFastAPIがレスポンスを処理できる状態になる前」であれば20〜25秒程度、起動後であれば2〜3.5秒程度でレスポンスが帰って来ました。

あと調べてませんがFlashBootのチェックは脳死でONにしてます。
使ったGPUはこれです。

yuki

Hey, what GPU model are you using? Can I also fine-tune it on Dataoorts: https://dataoorts.com/gpu/ for T4 GPUs?

kazuph

Thanks for your comment. I'm currently using NVIDIA RTX A4000, A4500, and 4000 Ada GPUs, which are from the professional RTX series.

I haven't fine-tuned models on T4 GPUs before, so I can't really comment on how well it would work - sorry about that!