👏

GKE alert

2025/02/22に公開

Google Cloudのモニタリングとアラートの仕組みを使って、以下のような構成を提案します:

  1. Cloud Monitoringでメトリクスを収集
  2. Alert Policyを設定
  3. Pub/Subでアラートを受け取り
  4. Cloud Functionsで処理してGoogle Workspaceに通知

以下にTerraformの実装例を示します:

monitoring.tf
# Alert Policy for CPU usage
resource "google_monitoring_alert_policy" "cpu_usage" {
  display_name = "GKE CPU Usage Alert"
  combiner     = "OR"
  conditions {
    display_name = "CPU usage above 80%"
    condition_threshold {
      filter = "resource.type = \"k8s_container\" AND metric.type = \"kubernetes.io/container/cpu/core_usage_time\""
      duration = "300s"
      comparison = "COMPARISON_GT"
      threshold_value = 0.8
      aggregations {
        alignment_period = "60s"
        per_series_aligner = "ALIGN_MEAN"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.pubsub.name]
}

# Alert Policy for Memory usage
resource "google_monitoring_alert_policy" "memory_usage" {
  display_name = "GKE Memory Usage Alert"
  combiner     = "OR"
  conditions {
    display_name = "Memory usage above 80%"
    condition_threshold {
      filter = "resource.type = \"k8s_container\" AND metric.type = \"kubernetes.io/container/memory/used_bytes\""
      duration = "300s"
      comparison = "COMPARISON_GT"
      threshold_value = 0.8
      aggregations {
        alignment_period = "60s"
        per_series_aligner = "ALIGN_MEAN"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.pubsub.name]
}
pubsub.tf
# Pub/Sub Topic
resource "google_pubsub_topic" "alerts" {
  name = "gke-alerts"
}

# Notification Channel
resource "google_monitoring_notification_channel" "pubsub" {
  display_name = "Pub/Sub Notification Channel"
  type         = "pubsub"
  labels = {
    topic = google_pubsub_topic.alerts.id
  }
}
function.tf
# Cloud Function to handle alerts
resource "google_storage_bucket" "function_bucket" {
  name = "alert-function-bucket"
}

resource "google_storage_bucket_object" "function_archive" {
  name   = "function-source.zip"
  bucket = google_storage_bucket.function_bucket.name
  source = "path/to/function/source.zip"  # Cloud Function のソースコード
}

resource "google_cloudfunctions_function" "alert_handler" {
  name        = "alert-handler"
  runtime     = "python39"
  
  available_memory_mb   = 256
  source_archive_bucket = google_storage_bucket.function_bucket.name
  source_archive_object = google_storage_bucket_object.function_archive.name
  
  event_trigger {
    event_type = "google.pubsub.topic.publish"
    resource   = google_pubsub_topic.alerts.name
  }

  entry_point = "handle_alert"
}

Cloud Function のPythonコード例:

main.py
from google.oauth2 import service_account
from googleapiclient.discovery import build
import base64
import json

def handle_alert(event, context):
    # Pub/Sub メッセージをデコード
    pubsub_message = base64.b64decode(event['data']).decode('utf-8')
    alert_data = json.loads(pubsub_message)

    # Google Workspace(Gmail)の設定
    SCOPES = ['https://www.googleapis.com/auth/gmail.send']
    credentials = service_account.Credentials.from_service_account_file(
        'service-account.json',
        scopes=SCOPES
    )
    
    # メール送信
    service = build('gmail', 'v1', credentials=credentials)
    message = {
        'raw': create_message(
            'sender@your-domain.com',
            'recipient@your-domain.com',
            'GKE Alert: Resource Usage High',
            f'Alert Details: {alert_data}'
        )
    }
    
    service.users().messages().send(userId='me', body=message).execute()

def create_message(sender, to, subject, message_text):
    # メールメッセージの作成処理
    # ...

この構成では:

  1. Cloud Monitoringで定期的にGKEのCPUとメモリ使用率をチェック
  2. 閾値(80%)を超えた場合、Alert Policyがトリガー
  3. アラートはPub/Subトピックに送信
  4. Cloud FunctionがPub/Subメッセージを受け取り
  5. Cloud FunctionがGoogle Workspace APIを使用してメール送信

注意点:

  • Google Workspace APIを使用するには適切な権限設定が必要
  • サービスアカウントの設定とGoogle Workspace側の設定も必要
  • アラートの閾値やチェック間隔は要件に応じて調整可能
  • メール以外にもGoogle ChatやCalendarなど他のWorkspaceサービスとの連携も可能

このような構成により、自動的なモニタリングとアラート通知の仕組みを実現できます。

Discussion