📲

Cloud Runで構築するIP制限をした推論エンドポイント

2024/01/22に公開

Google Cloud

IP制限をした推論エンドポイントをCloud Run + Load Balancer + Cloud Armorで構築してみました。

概要

APIサーバや推論エンドポイントをサクッと立てる場合、GCPではCloud Runが便利です。このエンドポイントを簡単に叩くには、Cloud Runの未認証呼び出しを許可し、Ingressの制御を全てに設定する必要があります。しかし、この場合、誰でも&どこからでもエンドポイントを叩けてしまいます。

本記事では、Cloud Run + Load Balancer + Cloud ArmorでIP制限をつけたエンドポイントを作成し、Cloud Runの推論エンドポイントへルーティングしてみました。
コンソールからの構築とTerraformからの構築の2つを試してみました。

GitHub: https://github.com/hosimesi/code-for-techblogs/tree/main/inference_on_secure_cloud_run

※本番環境で使用する場合は、CORSやCloud IAMの認証などの設定も適切に行い、セキュリティには十分注意してください。

構成図

今回作成するシステム構成図はほとんどこれです。

モデルの学習

アヤメの分類データセットをローカルで学習させ、学習済みのモデルオブジェクトをあらかじめGCSに保存しておきます。
モデルはロジスティック回帰を使用しました。
※ 簡単化のため、標準化等は行っていません。

import pickle
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


def main():

    iris = load_iris()

    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df["target"] = iris.target

    # データの分割
    X_train_valid, X_test, y_train_valid, y_test = train_test_split(df.drop(["target"], axis=1), df["target"], test_size=0.1, random_state=42)
    X_train, X_valid, y_train, y_valid = train_test_split(X_train_valid, y_train_valid, test_size=0.25, random_state=42)

    # モデルの学習
    model = LogisticRegression(random_state=42)
    model.fit(X_train, y_train)

    # モデルの評価
    print("Train score: ", model.score(X_train, y_train))
    print("Valid score: ", model.score(X_valid, y_valid))
    print("Test score: ", model.score(X_test, y_test))

    # モデルの保存
    with open("model.pkl", "wb") as f:
        pickle.dump(model, f)

if __name__ == '__main__':
    main()

これらをワンライナーで実行できるようにMakefileでまとめておきます。

.PHONY: run/train
run/train:
	docker run -it --rm $(ARTIFACT_REPOSITORY)/$(PROJECT_ID)/$(REPOSITORY_NAME)/$(TRAIN_IMAGE_NAME):$(SHORT_SHA)

サービスの構築

コンソール

Artifact Registry

Artifact Registryにinference-on-secure-cloud-runというレポジトリを作成します。

そして、Dockerfileをbuildし、imageをpushします。

.PHONY: push/inference
push/inference:
	gcloud auth configure-docker asia-northeast1-docker.pkg.dev
	docker push $(ARTIFACT_REPOSITORY)/$(PROJECT_ID)/$(REPOSITORY_NAME)/$(INFERENCE_IMAGE_NAME):$(SHORT_SHA)
	docker push $(ARTIFACT_REPOSITORY)/$(PROJECT_ID)/$(REPOSITORY_NAME)/$(INFERENCE_IMAGE_NAME):latest

Cloud Storage

Cloud Storageにinference-on-secure-cloud-runというバケットを作成します。

そして、artifacts/model.pklをアップロードします。

Service Account

Cloud Runを実行するサービスアカウントを作成します。
このサービスアカウントには、Cloud Storageの管理者権限を付与します。

Cloud Run

Cloud Runにinference-serverというサービスを作成します。imageは先ほどArtifact Registryにあげたimageを選択します。
ここで重要なポイントとして、上り（内向き）の制御を内部にし、外部アプリケーションロードバランサからのトラフィックを許可するを選択します。これにより、ロードバランサー経由でCloud Runにリクエストがルーティングされます。先ほど作成したService Accountを選択し、5000番ポートでリッスンします。

ロードバランサ

外部からのリクエストをasia-northeast1にのみルーティングすれば良いので、本来はリージョナル外部アプリケーションロードバランサで十分ですが、今回はグローバル外部アプリケーションロードバランサを使います。
GCPには多くのロードバランサが用意されていますが、それぞれの用途については以下の記事に詳しく記載されています。

Cloud Armor

まず、ロードバランサに適用するCloud Armorのポリシーを作成します。一旦、すべてのIPでDenyを設定します。

サーバーレスネットワークエンドポイントグループ

ネットワークエンドポイントグループ単位でロードバランサはリクエストを振り分けます。今回はCloud Runを使用しているので、サーバーレスネットワークエンドポイントグループを作成します。

Frontend Service

次に、リクエストを受け付けるフロントエンドの設定を行います。今回はHTTPですが、本番ではHTTPSにしましょう。

Backend Service

上記の設定を使用して、Backend Serviceを作成します。

そしてルーティングルールを以下のように設定して、作成します。

Terraform

先ほどはコンソールから作成しましたが、Terraformから作成することができます。
Terraformで管理外のサービスは以下とします。事前にCLI　or コンソールから作成してください。

Artifact Registry
Cloud Storage

# load balancer用の静的IP
resource "google_compute_global_address" "inference_lb_ip" {
  name         = "inference-lb-ip"
  description  = "inference用のload balancerの静的IP"
  address_type = "EXTERNAL"
  ip_version   = "IPV4"
  project      = var.project
}

# Cloud Run用のサービスアカウント
resource "google_service_account" "inference_cloud_run_service_account" {
  account_id   = "inference-cloud-run"
  display_name = "inference-cloud-run"
  description  = "inference用のcloud run service account"
}

# 推論サーバのCloud Run
resource "google_cloud_run_v2_service" "inference_cloud_run" {
  name        = "inference"
  location    = var.region
  description = "inferenceのcloud run service"
  ingress     = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"

  template {
    containers {
      name  = "inference"
      image = "asia-northeast1-docker.pkg.dev/${var.project}/inference-on-secure-cloud-run/inference:latest"
      ports {
        container_port = 5000
      }
      resources {
        cpu_idle = false
      }
    }

    scaling {
      min_instance_count = 0
      max_instance_count = 1
    }

    service_account = google_service_account.inference_cloud_run_service_account.email
  }

  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}

// Cloud Runの未認証呼び出し許可policy
data "google_iam_policy" "noauth" {
  binding {
    role = "roles/run.invoker"
    members = [
      "allUsers",
    ]
  }
}

// Cloud Runの未認証呼び出し許可を付与
resource "google_cloud_run_service_iam_policy" "noauth" {
  location = google_cloud_run_v2_service.inference_cloud_run.location
  project  = var.project
  service  = google_cloud_run_v2_service.inference_cloud_run.name

  policy_data = data.google_iam_policy.noauth.policy_data
}

# cloud run用のservice accountにcloud storageへのアクセス権限付与
resource "google_project_iam_member" "cloud_storage_iam" {
  project = var.project
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.inference_cloud_run_service_account.email}"
}

# Load Balancerのserverless NEG
resource "google_compute_region_network_endpoint_group" "inference_neg" {
  name                  = "inference-neg"
  network_endpoint_type = "SERVERLESS"
  region                = "asia-northeast1"
  # cloud runのserviceを指定
  cloud_run {
    service = google_cloud_run_v2_service.inference_cloud_run.name
  }
}

# Load Balancerのcloud armor policy
resource "google_compute_security_policy" "inference_policy" {
  name        = "inference-policy"
  description = "Load Balancer用のcloud armor policy"
  rule {
    action   = "allow"
    priority = 1000
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        # FIXME: your ip address
        src_ip_ranges = ["your ip address"]
      }
    }
    description = "my home ip address"
  }
  rule {
    action   = "deny(403)"
    priority = 2147483647
    match {
      versioned_expr = "SRC_IPS_V1"
      config {
        src_ip_ranges = ["*"]
      }
    }
    description = "default rule"
  }
  adaptive_protection_config {
    layer_7_ddos_defense_config {
      enable = true
    }
  }
}

# load balancerのbackend service
resource "google_compute_backend_service" "inference_backend_service" {
  name                  = "inference-backend-service"
  protocol              = "HTTP"
  port_name             = "http"
  timeout_sec           = 30
  load_balancing_scheme = "EXTERNAL_MANAGED"

  # cloud armor policyを指定
  security_policy = google_compute_security_policy.inference_policy.id

  backend {
    group = google_compute_region_network_endpoint_group.inference_neg.self_link
  }
}

# url map
resource "google_compute_url_map" "inference_url_map" {
  name        = "inference-lb"
  description = "inferenceのload balancer用のlb"

  default_service = google_compute_backend_service.inference_backend_service.id

  path_matcher {
    name            = "inference-apps"
    default_service = google_compute_backend_service.inference_backend_service.id
  }
}

resource "google_compute_target_http_proxy" "inference_target_http_proxy" {
  name    = "predictor-target-http-proxy"
  url_map = google_compute_url_map.inference_url_map.id
}

# フロントエンドの設定(http)
resource "google_compute_global_forwarding_rule" "inference_forwarding_rule_http" {
  name                  = "inference-forwarding-rule-http"
  description           = "load balancerのforwarding rule(http)"
  load_balancing_scheme = "EXTERNAL_MANAGED"
  target                = google_compute_target_http_proxy.inference_target_http_proxy.id
  ip_address            = google_compute_global_address.inference_lb_ip.address
  ip_protocol           = "TCP"
  port_range            = "80"
}

リクエスト

現状でロードバランサのIPアドレスにリクエストを投げると403でDenyされています。

curl -X POST http://<your-lb-ip>/inference/ -H 'Content-Type: application/json' -d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
<!doctype html><meta charset="utf-8"><meta name=viewport content="width=device-width, initial-scale=1"><title>403</title>403 Forbidden%

次に、自分の自宅IPをCloud Armorの許可に追加してみます。

curl -X POST http://<your-lb-ip>/inference/ -H 'Content-Type: application/json' -d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
{"setosa_probability":0.9768459539213016,"versicolor_probability":0.023153810153606156,"virginica_probability":2.3592509219498453e-07,"target":"setosa"}

無事結果が返ってきました。

まとめ

Cloud Runを使ってIP制限をした推論エンドポイントを立てました。
また、今回はパブリックの推論エンドポイントを立てていますが、本番環境で使う場合はプライベートなエンドポイントを作成することをお勧めします。
小規模にMLOpsを始めるなら、モデルの学習をCloud Run Jobsなどで行い、推論サーバと組み合わせると良さそうです。

参考

Discussion

ログインするとコメントできます

概要

構成図

モデルの学習

サービスの構築

コンソール

Artifact Registry

Cloud Storage

Service Account

Cloud Run

ロードバランサ

Cloud Armor

サーバーレス ネットワーク エンドポイント グループ

Frontend Service

Backend Service

Terraform

リクエスト

まとめ

参考

Discussion

サーバーレスネットワークエンドポイントグループ