🪟

CloudWatchでのアラート発砲(Windows)

2023/03/23に公開

はじめに

今回はCloudWatchを使用したアラート発砲の手順について備忘録として残しておく。アラートのテンプレートを作成し、閾値外となったときにはメールが飛ぶように設定する。

SNSの設定

トピックの作成

スタンダードタイプ、任意の名前を入力し、作成を選択(その他の項目については、デフォルトのままで可)。

サブスクリプションの作成

作成したトピックでサブスクリプションを作成する。プロトコルは[Eメール]を選択する。

サブスクリプションの確認

入力したEメールアドレスからサブスクリプションの確認を行う。

マネジメントコンソールのサブスクリプションの画面からステータスが[確認済み]となっていることを確認する。

Windowsログに対してのアラート設定

以下で設定したWindowsのイベントログに対してもエラー表記があったときにアラートが発砲されるようメトリクスフィルターを作成する。
https://zenn.dev/ktr200803/articles/ec70c51233bee3

メトリクスフィルターの作成

[CloudWatch]→[ロググループ]→[/aws/ec2/Windows/Application]でメトリクスフィルターを作成する。
error,Error,ERRORのいずれかがログに出てきたときにメトリクスが1となるよう設定する。

同じように[Security]、[System]でもメトリクスフィルターを作成する。

CloudFormationでのアラート作成

CloudWatchアラーム用のjsonテンプレートを作成し、CloudFormationで設定する。

alart.json
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "CloudWatch alarm for EC2 instance",

  "Parameters": {
    "InstanceId": {
      "Type": "AWS::EC2::Instance::Id"
    },
    "HostName": {
      "Type": "String",
      "AllowedPattern": "^.+$"
    },
    "AlarmNamePrefix": {
      "Type": "String",
      "AllowedPattern": "^.+$"
    },
    "EnableDriveD": {
      "Type": "String",
      "Default": "NO",
      "AllowedValues": [ "YES", "NO" ]
    },
    "EnableAlarmAction": {
      "Type": "String",
      "Default": "true",
      "AllowedValues": [ "true", "false" ]
    },
    "ActionOK": {
      "Type": "String",
      "AllowedPattern": "^(arn:aws:.*)?$"
    },
    "ActionAlarm": {
      "Type": "String",
      "AllowedPattern": "^(arn:aws:.*)?$"
    },
    "ActionInsufficient": {
      "Type": "String",
      "AllowedPattern": "^(arn:aws:.*)?$"
    }
  },

  "Conditions": {
    "CreateDriveD": {
      "Fn::Equals": [ { "Ref": "EnableDriveD" }, "YES" ]
    },
    "IsEmptyActionOK": {
      "Fn::Equals": [ { "Ref": "ActionOK" }, "" ]
    },
    "IsEmptyActionAlarm": {
      "Fn::Equals": [ { "Ref": "ActionAlarm" }, "" ]
    },
    "IsEmptyActionInsufficient": {
      "Fn::Equals": [ { "Ref": "ActionInsufficient" }, "" ]
    }
  },

  "Resources": {
    "AlarmCPU": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}CPU" },
        "Namespace": "AWS/EC2",
        "Dimensions": [
          { "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
        ],
        "MetricName": "CPUUtilization",
        "Period": 900,
        "EvaluationPeriods": 1,
        "Statistic": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 90,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching" ,
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmInstance": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Instance" },
        "Namespace": "AWS/EC2",
        "Dimensions": [
          { "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
        ],
        "MetricName": "StatusCheckFailed_Instance",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Maximum",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 0,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmSystem": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}System" },
        "Namespace": "AWS/EC2",
        "Dimensions": [
          { "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
        ],
        "MetricName": "StatusCheckFailed_System",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Maximum",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 0,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmApplicationLog": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}ApplicationLog" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } }
        ],
        "MetricName": "ApplicationLog",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Sum",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 0,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "missing",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmSystemLog": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}SystemLog" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } }
        ],
        "MetricName": "SystemLog",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Sum",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 0,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "missing",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmSecurityLog": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}SecurityLog" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } }
        ],
        "MetricName": "SecurityLog",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Sum",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 0,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "missing",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmMemory": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Memory" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "objectname", "Value": "Memory" }
        ],
        "MetricName": "Memory % Committed Bytes In Use",
        "Period": 900,
        "EvaluationPeriods": 1,
        "Statistic": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": 90,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmEventLog": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}EventLog" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "pid_finder", "Value": "native" },
          { "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k LocalServiceNetworkRestricted -p -s EventLog" }
        ],
        "MetricName": "procstat_lookup pid_count",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 1,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmSchedule": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Schedule" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "pid_finder", "Value": "native" },
          { "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k netsvcs -p -s Schedule" }
        ],
        "MetricName": "procstat_lookup pid_count",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 1,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmTermService": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}TermService" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "pid_finder", "Value": "native" },
          { "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k termsvcs -s TermService" }
        ],
        "MetricName": "procstat_lookup pid_count",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 1,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmW32Time": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}W32Time" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "pid_finder", "Value": "native" },
          { "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k LocalService -s W32Time" }
        ],
        "MetricName": "procstat_lookup pid_count",
        "Period": 60,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 1,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmC80": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_c_80" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "objectname", "Value": "LogicalDisk" },
          { "Name": "instance", "Value": "C:" }
        ],
        "MetricName": "LogicalDisk % Free Space",
        "Period": 3600,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 20,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmC90": {
      "Type": "AWS::CloudWatch::Alarm",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_c_90" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "objectname", "Value": "LogicalDisk" },
          { "Name": "instance", "Value": "C:" }
        ],
        "MetricName": "LogicalDisk % Free Space",
        "Period": 3600,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 10,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmD80": {
      "Type": "AWS::CloudWatch::Alarm",
      "Condition": "CreateDriveD",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_d_80" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "objectname", "Value": "LogicalDisk" },
          { "Name": "instance", "Value": "D:" }
        ],
        "MetricName": "LogicalDisk % Free Space",
        "Period": 3600,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 20,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    },
    "AlarmD90": {
      "Type": "AWS::CloudWatch::Alarm",
      "Condition": "CreateDriveD",
      "Properties": {
        "AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_d_90" },
        "Namespace": "CWAgent",
        "Dimensions": [
          { "Name": "host", "Value": { "Ref": "HostName" } },
          { "Name": "objectname", "Value": "LogicalDisk" },
          { "Name": "instance", "Value": "D:" }
        ],
        "MetricName": "LogicalDisk % Free Space",
        "Period": 3600,
        "EvaluationPeriods": 1,
        "Statistic": "Minimum",
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 10,
        "ActionsEnabled": { "Ref": "EnableAlarmAction" },
        "TreatMissingData": "breaching",
        "OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
        "AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
        "InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
      }
    }
  },

  "Outputs": {
  }
}

上記のjsonファイルを使用してスタックを作成する。
※以下画像の項目は任意で入れる。

※パラメータの説明

パラメータ 入れる値 説明
ActionAlarm SNSのARN アラートが起きたときのアクション
ActionInsufficient なし データ不測の時のアクション
ActionOK なし アラームが直った時のアクション
HostName 対象のホストネーム CloudWatchメトリクスから確認可能
InstanceId 対象のインスタンスID EC2から確認可能

スタックが作成されていることを確認する。
以下のようにアラートが作成されていることを確認する。

アラート発砲

インスタンスを止めたり閾値を変更したりしてアラートを発砲させる。そうすると以下のようなメールが登録したメールアドレスに届く。

おわりに

閾値の調整や欠落データの扱いが難しく、監視抑止についても考えなければならないため要件やCloudWatchが出来ることを確認しながら進めていく必要がある。

Discussion