CloudWatchでのアラート発砲(Windows)
はじめに
今回はCloudWatchを使用したアラート発砲の手順について備忘録として残しておく。アラートのテンプレートを作成し、閾値外となったときにはメールが飛ぶように設定する。
SNSの設定
トピックの作成
スタンダードタイプ、任意の名前を入力し、作成を選択(その他の項目については、デフォルトのままで可)。
サブスクリプションの作成
作成したトピックでサブスクリプションを作成する。プロトコルは[Eメール]を選択する。
サブスクリプションの確認
入力したEメールアドレスからサブスクリプションの確認を行う。
マネジメントコンソールのサブスクリプションの画面からステータスが[確認済み]となっていることを確認する。
Windowsログに対してのアラート設定
以下で設定したWindowsのイベントログに対してもエラー表記があったときにアラートが発砲されるようメトリクスフィルターを作成する。
メトリクスフィルターの作成
[CloudWatch]→[ロググループ]→[/aws/ec2/Windows/Application]でメトリクスフィルターを作成する。
error,Error,ERRORのいずれかがログに出てきたときにメトリクスが1となるよう設定する。
同じように[Security]、[System]でもメトリクスフィルターを作成する。
CloudFormationでのアラート作成
CloudWatchアラーム用のjsonテンプレートを作成し、CloudFormationで設定する。
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "CloudWatch alarm for EC2 instance",
"Parameters": {
"InstanceId": {
"Type": "AWS::EC2::Instance::Id"
},
"HostName": {
"Type": "String",
"AllowedPattern": "^.+$"
},
"AlarmNamePrefix": {
"Type": "String",
"AllowedPattern": "^.+$"
},
"EnableDriveD": {
"Type": "String",
"Default": "NO",
"AllowedValues": [ "YES", "NO" ]
},
"EnableAlarmAction": {
"Type": "String",
"Default": "true",
"AllowedValues": [ "true", "false" ]
},
"ActionOK": {
"Type": "String",
"AllowedPattern": "^(arn:aws:.*)?$"
},
"ActionAlarm": {
"Type": "String",
"AllowedPattern": "^(arn:aws:.*)?$"
},
"ActionInsufficient": {
"Type": "String",
"AllowedPattern": "^(arn:aws:.*)?$"
}
},
"Conditions": {
"CreateDriveD": {
"Fn::Equals": [ { "Ref": "EnableDriveD" }, "YES" ]
},
"IsEmptyActionOK": {
"Fn::Equals": [ { "Ref": "ActionOK" }, "" ]
},
"IsEmptyActionAlarm": {
"Fn::Equals": [ { "Ref": "ActionAlarm" }, "" ]
},
"IsEmptyActionInsufficient": {
"Fn::Equals": [ { "Ref": "ActionInsufficient" }, "" ]
}
},
"Resources": {
"AlarmCPU": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}CPU" },
"Namespace": "AWS/EC2",
"Dimensions": [
{ "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
],
"MetricName": "CPUUtilization",
"Period": 900,
"EvaluationPeriods": 1,
"Statistic": "Average",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 90,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching" ,
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmInstance": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Instance" },
"Namespace": "AWS/EC2",
"Dimensions": [
{ "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
],
"MetricName": "StatusCheckFailed_Instance",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Maximum",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmSystem": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}System" },
"Namespace": "AWS/EC2",
"Dimensions": [
{ "Name": "InstanceId", "Value": { "Ref": "InstanceId" } }
],
"MetricName": "StatusCheckFailed_System",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Maximum",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmApplicationLog": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}ApplicationLog" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } }
],
"MetricName": "ApplicationLog",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Sum",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "missing",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmSystemLog": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}SystemLog" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } }
],
"MetricName": "SystemLog",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Sum",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "missing",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmSecurityLog": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}SecurityLog" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } }
],
"MetricName": "SecurityLog",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Sum",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 0,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "missing",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmMemory": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Memory" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "objectname", "Value": "Memory" }
],
"MetricName": "Memory % Committed Bytes In Use",
"Period": 900,
"EvaluationPeriods": 1,
"Statistic": "Average",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": 90,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmEventLog": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}EventLog" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "pid_finder", "Value": "native" },
{ "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k LocalServiceNetworkRestricted -p -s EventLog" }
],
"MetricName": "procstat_lookup pid_count",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 1,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmSchedule": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Schedule" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "pid_finder", "Value": "native" },
{ "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k netsvcs -p -s Schedule" }
],
"MetricName": "procstat_lookup pid_count",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 1,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmTermService": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}TermService" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "pid_finder", "Value": "native" },
{ "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k termsvcs -s TermService" }
],
"MetricName": "procstat_lookup pid_count",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 1,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmW32Time": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}W32Time" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "pid_finder", "Value": "native" },
{ "Name": "pattern", "Value": "C:\\Windows\\system32\\svchost.exe -k LocalService -s W32Time" }
],
"MetricName": "procstat_lookup pid_count",
"Period": 60,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 1,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmC80": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_c_80" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "objectname", "Value": "LogicalDisk" },
{ "Name": "instance", "Value": "C:" }
],
"MetricName": "LogicalDisk % Free Space",
"Period": 3600,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 20,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmC90": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_c_90" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "objectname", "Value": "LogicalDisk" },
{ "Name": "instance", "Value": "C:" }
],
"MetricName": "LogicalDisk % Free Space",
"Period": 3600,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 10,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmD80": {
"Type": "AWS::CloudWatch::Alarm",
"Condition": "CreateDriveD",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_d_80" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "objectname", "Value": "LogicalDisk" },
{ "Name": "instance", "Value": "D:" }
],
"MetricName": "LogicalDisk % Free Space",
"Period": 3600,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 20,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
},
"AlarmD90": {
"Type": "AWS::CloudWatch::Alarm",
"Condition": "CreateDriveD",
"Properties": {
"AlarmName": { "Fn::Sub": "${AlarmNamePrefix}Disk_d_90" },
"Namespace": "CWAgent",
"Dimensions": [
{ "Name": "host", "Value": { "Ref": "HostName" } },
{ "Name": "objectname", "Value": "LogicalDisk" },
{ "Name": "instance", "Value": "D:" }
],
"MetricName": "LogicalDisk % Free Space",
"Period": 3600,
"EvaluationPeriods": 1,
"Statistic": "Minimum",
"ComparisonOperator": "LessThanThreshold",
"Threshold": 10,
"ActionsEnabled": { "Ref": "EnableAlarmAction" },
"TreatMissingData": "breaching",
"OKActions": { "Fn::If": [ "IsEmptyActionOK", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionOK" } ] ] },
"AlarmActions": { "Fn::If": [ "IsEmptyActionAlarm", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionAlarm" } ] ] },
"InsufficientDataActions": { "Fn::If": [ "IsEmptyActionInsufficient", { "Ref": "AWS::NoValue" }, [ { "Ref": "ActionInsufficient" } ] ] }
}
}
},
"Outputs": {
}
}
上記のjsonファイルを使用してスタックを作成する。
※以下画像の項目は任意で入れる。
※パラメータの説明
パラメータ | 入れる値 | 説明 |
---|---|---|
ActionAlarm | SNSのARN | アラートが起きたときのアクション |
ActionInsufficient | なし | データ不測の時のアクション |
ActionOK | なし | アラームが直った時のアクション |
HostName | 対象のホストネーム | CloudWatchメトリクスから確認可能 |
InstanceId | 対象のインスタンスID | EC2から確認可能 |
スタックが作成されていることを確認する。
以下のようにアラートが作成されていることを確認する。
アラート発砲
インスタンスを止めたり閾値を変更したりしてアラートを発砲させる。そうすると以下のようなメールが登録したメールアドレスに届く。
おわりに
閾値の調整や欠落データの扱いが難しく、監視抑止についても考えなければならないため要件やCloudWatchが出来ることを確認しながら進めていく必要がある。
Discussion