👏
GKE alert
Google Cloudのモニタリングとアラートの仕組みを使って、以下のような構成を提案します:
- Cloud Monitoringでメトリクスを収集
- Alert Policyを設定
- Pub/Subでアラートを受け取り
- Cloud Functionsで処理してGoogle Workspaceに通知
以下にTerraformの実装例を示します:
monitoring.tf
# Alert Policy for CPU usage
resource "google_monitoring_alert_policy" "cpu_usage" {
display_name = "GKE CPU Usage Alert"
combiner = "OR"
conditions {
display_name = "CPU usage above 80%"
condition_threshold {
filter = "resource.type = \"k8s_container\" AND metric.type = \"kubernetes.io/container/cpu/core_usage_time\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 0.8
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = [google_monitoring_notification_channel.pubsub.name]
}
# Alert Policy for Memory usage
resource "google_monitoring_alert_policy" "memory_usage" {
display_name = "GKE Memory Usage Alert"
combiner = "OR"
conditions {
display_name = "Memory usage above 80%"
condition_threshold {
filter = "resource.type = \"k8s_container\" AND metric.type = \"kubernetes.io/container/memory/used_bytes\""
duration = "300s"
comparison = "COMPARISON_GT"
threshold_value = 0.8
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_MEAN"
}
}
}
notification_channels = [google_monitoring_notification_channel.pubsub.name]
}
pubsub.tf
# Pub/Sub Topic
resource "google_pubsub_topic" "alerts" {
name = "gke-alerts"
}
# Notification Channel
resource "google_monitoring_notification_channel" "pubsub" {
display_name = "Pub/Sub Notification Channel"
type = "pubsub"
labels = {
topic = google_pubsub_topic.alerts.id
}
}
function.tf
# Cloud Function to handle alerts
resource "google_storage_bucket" "function_bucket" {
name = "alert-function-bucket"
}
resource "google_storage_bucket_object" "function_archive" {
name = "function-source.zip"
bucket = google_storage_bucket.function_bucket.name
source = "path/to/function/source.zip" # Cloud Function のソースコード
}
resource "google_cloudfunctions_function" "alert_handler" {
name = "alert-handler"
runtime = "python39"
available_memory_mb = 256
source_archive_bucket = google_storage_bucket.function_bucket.name
source_archive_object = google_storage_bucket_object.function_archive.name
event_trigger {
event_type = "google.pubsub.topic.publish"
resource = google_pubsub_topic.alerts.name
}
entry_point = "handle_alert"
}
Cloud Function のPythonコード例:
main.py
from google.oauth2 import service_account
from googleapiclient.discovery import build
import base64
import json
def handle_alert(event, context):
# Pub/Sub メッセージをデコード
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
alert_data = json.loads(pubsub_message)
# Google Workspace(Gmail)の設定
SCOPES = ['https://www.googleapis.com/auth/gmail.send']
credentials = service_account.Credentials.from_service_account_file(
'service-account.json',
scopes=SCOPES
)
# メール送信
service = build('gmail', 'v1', credentials=credentials)
message = {
'raw': create_message(
'sender@your-domain.com',
'recipient@your-domain.com',
'GKE Alert: Resource Usage High',
f'Alert Details: {alert_data}'
)
}
service.users().messages().send(userId='me', body=message).execute()
def create_message(sender, to, subject, message_text):
# メールメッセージの作成処理
# ...
この構成では:
- Cloud Monitoringで定期的にGKEのCPUとメモリ使用率をチェック
- 閾値(80%)を超えた場合、Alert Policyがトリガー
- アラートはPub/Subトピックに送信
- Cloud FunctionがPub/Subメッセージを受け取り
- Cloud FunctionがGoogle Workspace APIを使用してメール送信
注意点:
- Google Workspace APIを使用するには適切な権限設定が必要
- サービスアカウントの設定とGoogle Workspace側の設定も必要
- アラートの閾値やチェック間隔は要件に応じて調整可能
- メール以外にもGoogle ChatやCalendarなど他のWorkspaceサービスとの連携も可能
このような構成により、自動的なモニタリングとアラート通知の仕組みを実現できます。
Discussion