iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔔

How to Send Repeat Email Notifications for Ongoing Firing Alerts in SORACOM Lagoon 3

に公開

TL;DR

  • In SORACOM Lagoon 3 / Grafana Alerting, evaluations after the first instance of the same condition are treated as "continuation of the same firing alert" rather than "new alerts."
  • The default notification design suppresses duplicate emails for the same alert by sending notifications only when transitioning from normal -> firing and sending a recovery notification when transitioning from firing -> normal.
  • If you want to re-notify via email every time an evaluation occurs while firing continues, align the Group interval and Repeat interval of the Notification policy with the alert rule's evaluation interval, rather than just the evaluation interval itself.
  • This article handles settings for metrics sent at 5-minute intervals, aligning Evaluate every = 5m, Group interval = 5m, and Repeat interval = 5m.
  • Since the time until notification is not strictly guaranteed in Lagoon 3, it is safer to think of these settings as "approaching a state where re-notification aligns with the evaluation timing" rather than "notifying at the exact second of evaluation."
  • When sending metrics at 5-minute intervals, setting the query range slightly wider, such as last 10m or last 15m instead of exactly last 5m, makes it easier to absorb offsets in ingestion timing.

Introduction

Notifications in SORACOM Lagoon 3 / Grafana Alerting are typically triggered by changes in alert state.
For example, you might use it to notify about an anomaly occurrence with normal -> firing and send a recovery notification with firing -> normal.

This behavior is convenient for avoiding redundant emails while the same alert remains in the firing state. However, there are cases, such as facility or battery monitoring, where you "want to be notified every time a determination is made as long as the condition persists."

In this article, using an environment where metrics are sent once every 5 minutes as an example, I will organize the settings for re-notifying via email every time an alert is determined while it continues firing.

The expected behavior is as follows:

12:00 Evaluation -> firing -> Email notification
12:05 Evaluation -> firing continues -> Email re-notification
12:10 Evaluation -> firing continues -> Email re-notification
12:15 Evaluation -> normal -> No firing notification sent

It is important to note here that the re-notification interval is not determined solely by the alert rule's evaluation interval. Even if the alert rule is evaluated every 5 minutes, if the re-notification interval on the Notification policy side remains long, emails will not be sent every time even if firing continues.

Overall Configuration

The settings for this configuration primarily involve the following two locations:

Target Role
Alert rule Determines how often to evaluate metrics and under what conditions to trigger firing.
Notification policy Determines which Contact point to send firing alerts to and at what interval to re-notify.

The flow involves evaluating conditions with the Alert rule, using labels set on the Alert rule to route to a Notification policy, and then notifying the Email contact point that matches the policy.

Metrics every 5 minutes
  -> Evaluated every 5 minutes by Alert rule
  -> Firing if condition is met
  -> Routed to Notification policy via Label
  -> Initial or re-notification via Email

Official SORACOM Lagoon 3 documentation also describes the flow of controlling alerts by combining Alert rules, Contact points, and Notification policies.

Prerequisites

This article assumes the following are already prepared:

  • Able to log in to SORACOM Lagoon 3
  • A Lagoon user with permissions to edit Contact points and Alert rules
  • Metrics that are sent at 5-minute intervals
  • An email address to use as a notification destination

GUI Configuration Steps

From here, we will configure the settings using the Lagoon 3 console.
This article uses the following values for explanation.
Please change the actual Alert name, Folder, email address, and thresholds to suit your environment.

Item Example
Contact point name Mail-repeat-5m
Alert rule name Battery alert repeat 5m
Folder takao2
Group Five minute evaluation
Label notify_every_eval=true
Destination email address member@example.com

Step 1: Create an Email Contact point

First, create the Email Contact point that will serve as the alert notification destination.
Contact points are configured by Lagoon users with the Editor role.
If [New contact point] does not appear on your screen, please check your Lagoon user's role.

  1. Log in to the Lagoon console.
  2. Open [Alerting] from the left menu.
  3. Click [Contact points].
  4. Click [New contact point].
  5. Enter Mail-repeat-5m in [Name].
  6. Select [Email] for [Contact point type].
  7. Enter the destination email address in [Addresses].
  8. Set [Subject] and [Message] as needed. You can leave them blank or use Grafana's default template.
  9. If you do not need resolved notifications, enable [Disable resolved message]. Leave it disabled if you wish to receive recovery notifications.
  10. Click [Save contact point].

Screen to create an Email Contact point

As per the official documentation, specify the destination email address in [Addresses] for the Email contact point.
If specifying multiple addresses, separate them with a ;.
Additionally, up to 25 email addresses can be registered, and registering a large number of destination addresses may delay alert email delivery.
In configurations like this where the re-notification interval is shortened, the number of emails tends to increase, so in actual operation, it is safer to keep the number of recipients to a minimum or aggregate them into a team mailing list.

Reference:

Step 2: Set Alert rule conditions and evaluation intervals

Next, create an Alert rule that evaluates every 5 minutes.

  1. Open [Alerting] from the left menu.
  2. Click [Alert rules].
  3. Click [New alert rule].
  4. Confirm that [Lagoon managed alert] is selected.
  5. Select the monitored resource and series in the Query.
  6. For metrics with a 5-minute cycle, do not set the evaluation range exactly to last 5m; instead, provide some buffer such as now-10m to now or now-15m to now.
  7. Set Reduce or Threshold in the Expression to create the condition for firing.
  8. Click [Preview] to confirm that the condition evaluation result is as expected.

Screen to set Query and Expression for Alert rule

The contents of the Query and Expression will vary depending on the metrics you wish to monitor.
Since the subject of this article is the notification interval, if you already have existing Alert conditions, please use those and proceed to check the evaluation interval and Label settings.

Next, set the evaluation interval for the alert condition.

  1. Enter 5m in [Evaluate every].
  2. Enter 0 in [for].

Screen to set Alert rule evaluation interval to 5m and for to 0

Evaluate every is the evaluation interval for the Alert rule.
for is the waiting time from when the alert condition is met until the Alert rule state changes to Firing.
During this time, even if the condition is met, it is treated as pending and usually no notification is sent.
Setting this to 0 causes the rule to enter Firing immediately upon an evaluation where the condition is met.

Reference:

Step 3: Configure Alert rule details

Set the Alert rule name, destination folder, and evaluation group.

  1. Enter Battery alert repeat 5m in [Rule name].
  2. Select the destination Folder in [Folder]. In the screenshot example, takao2 is selected.
  3. Enter Five minute evaluation in [Group].

Screen to set Rule name, Folder, and Group for Alert rule

The Group set here is a group for sharing the evaluation interval of Alert rules.
Note that this is a different feature from the Alert group used in Notification policies, so be careful not to confuse them.

Step 4: Set Labels for the Alert rule

Next, set the Labels to be used in the Notification policy. In the current Lagoon 3 / Grafana Alerting screen, Labels are set in the [Custom Labels] section under [Notifications].

  1. Scroll down to the [Notifications] section.
  2. Enter notify_every_eval in [Labels] under [Custom Labels].
  3. Enter true as the value.
  4. Review the contents of the Alert rule and click [Save].

Screen to set notify_every_eval=true in Alert rule Custom Labels

notify_every_eval=true serves as a marker to route alerts that you want to re-notify upon every evaluation to the Notification policy. The Label itself does not send notifications.

Step 5: Create a Notification policy for re-notification upon every evaluation

Finally, create a Notification policy that links the Alert rule Label with the Email Contact point. Here, add a policy under Specific routing rather than using the Root policy. If [New policy] or editing operations are not displayed on the screen, please check the roles of your Lagoon user as you did with the Contact point.

  1. Open [Alerting] from the left menu.
  2. Click [Notification policies].
  3. Click [New policy].
  4. Click [Add matcher] under [Matching labels].
  5. Enter notify_every_eval in [Label].
  6. Select = for [Operator].
  7. Enter true in [Value].
  8. Select Mail-repeat-5m (created in Step 1) in [Contact point].
  9. Unless there is a clear reason to evaluate other sibling policies, keep [Continue matching subsequent sibling nodes] turned off.

In the following screenshot, the existing dummy_email_will_not_send is selected for screen reproduction. In actual operation, please select the Email Contact point you created in Step 1.

Screen to set Matching labels and Contact point for Notification policy

Next, override the Timing options.

  1. Turn on [Override general timings].
  2. Set [Group wait] to 1s.
  3. Set [Group interval] to 5m.
  4. Set [Repeat interval] to 5m.
  5. Click [Save policy] or [Save].

Screen to set Override general timings for Notification policy

By using Specific routing, you can apply re-notification settings tailored to the evaluation interval only for alerts with notify_every_eval=true. Shortening the timing of the Root policy might cause other alerts to be re-notified at short intervals as well, so it is easier to manage by separating them into Specific routing when the target alerts are limited.

Reference:

Step 6: Verify operation

Evaluation begins once the Alert rule is saved. When the condition is met, the firing Alert is displayed in the Alert group, and notification is sent to the Contact point that matches the Notification policy condition.

Check the following points:

  • The Alert rule state becomes Firing.
  • The Alert rule has the notify_every_eval=true Label.
  • The Notification policy Matching labels exactly match notify_every_eval=true.
  • The initial notification is received at the Email Contact point.
  • While firing persists, re-notification occurs approximately 5 minutes after the previous notification.

In Lagoon 3, it may take anywhere from several tens of seconds to over a minute from the moment the Alert rule state becomes Firing until the notification is actually sent to the Contact point. Therefore, even if the email arrival time does not match the evaluation time in seconds, do not immediately assume a configuration error; check the Notification policy Label and Timing options instead.

Meaning of each parameter

First, the diagram below illustrates which parts the settings in this configuration affect.

The Alert rule on the left is the part that evaluates metrics and creates the Alert state.
The Notification policy on the right controls where firing alerts are sent and determines the intervals for the initial notification and re-notifications.

Evaluate every = 5m

Evaluate every is the setting that determines how often to evaluate the Alert rule.
Since we assume metrics are sent once every 5 minutes, we also evaluate the Alert rule every 5 minutes.

However, this alone does not result in email re-notification upon every evaluation. The Alert rule evaluation is for updating the Alert state, while the re-notification interval is controlled on the Notification policy side.

For / Pending period = 0s

For or Pending period is the waiting time from when a condition is met until the transition to firing occurs.
During this time, the alert is treated as pending even if the condition is met, and no notification is sent under normal circumstances.
Setting this to 0s ensures the state immediately becomes firing upon the evaluation where the condition is met.

Condition met -> immediate firing

For example, setting this to 5m means the first time the condition is met, it will be pending, and it will only become firing once the condition has persisted for 5 minutes or more. This is effective for preventing notifications due to temporary spikes or momentary data fluctuations. However, the first notification will be delayed by that amount. If you want to notify from the very first evaluation as in this case, 0s is appropriate.

Label: notify_every_eval = true

This Label is not a setting to directly trigger a notification. Its role is as a marker to route this Alert to a dedicated Notification policy.

Assign notify_every_eval=true to Alert rule
  -> Matches Notification policy's Matching labels
  -> Timing options for per-evaluation re-notification are applied

By using Labels to separate policies, you can ensure only the target alerts are subject to per-evaluation re-notification without affecting the notification intervals of other alerts.

Group wait = 1s

Group wait is the time to wait before sending the first notification for a new Alert group. Grafana Alerting has a mechanism to wait slightly before the first notification in order to group multiple alerts.

In this configuration, it is set to 1s with the intention of sending the initial notification as soon as possible after entering a firing state.

Newly firing
  -> Wait 1s
  -> Send initial email notification

However, the official Lagoon 3 documentation states that the time it takes until notification is actually sent to the Contact point is not guaranteed and may take anywhere from tens of seconds to several minutes. Therefore, Group wait = 1s is a setting to "shorten the wait time before handing it off to the Lagoon 3/Grafana notification process" and does not guarantee the email will arrive exactly 1 second later.

Group interval = 5m

Group interval is the interval until the next notification can be sent for the same Alert group. If a new firing alert is added to an Alert group or an existing alert transitions to resolved, the next notification will be sent according to this interval.

If you want to re-notify every 5 minutes with Repeat interval = 5m as we have here, it is important to align the Group interval to 5 minutes as well.

Repeat interval = 5m

Repeat interval is the interval for reminder re-notifications when the same Alert group remains unchanged and continues to fire. This setting is central to our requirement.

The official Grafana Alerting documentation explains that the Repeat interval is evaluated at the moment the Group interval is reset. Also, Repeat interval must be greater than or equal to Group interval and must be a multiple of the Group interval.

Therefore, if you want to re-notify every 5 minutes, align them as follows:

Group interval: 5m
Repeat interval: 5m

If you only set Repeat interval = 5m and leave Group interval = 10m, the evaluation timing for the re-notification itself will be restricted to 10-minute intervals, meaning notifications may not be sent every 5 minutes as expected.

Reference:

Why re-notification is possible upon every evaluation

In Grafana Alerting, if the same alert meets the condition twice in a row, the second time is not treated as a "new Alert." Internally, it is the same Alert instance or the same Alert group continuing in a firing state.

1st evaluation: Condition met -> transition to firing
2nd evaluation: Same Alert continues firing
3rd evaluation: Same Alert continues firing

Therefore, a new notification is not usually sent with every evaluation. Re-notification during firing is controlled by the Repeat interval of the Notification policy.

In this setup, we align the following two cycles to 5 minutes:

Alert rule evaluation interval: 5 minutes
Notification policy re-notification interval: 5 minutes

As a result, while firing continues, it approaches a behavior where re-notification is triggered to align with the 5-minute evaluation timing.

Timeline example

Below is an image of the operation. The actual notification time will fluctuate depending on Lagoon 3 data acquisition timing, Alert evaluation timing, notification processing, and email delivery.

The point of the diagram is that the evaluations at 12:05 and 12:10 do not create a new alert, but confirm that the same Alert group is continuing to fire. The evaluation passes the firing state to the notification process, and an email is sent or re-sent according to the Group wait or Repeat interval on the notification side.

Whether to send resolved notifications depends on the Contact point and Notification policy settings. If resolved notifications are also necessary for operation, ensure you check the handling of resolved notifications in addition to firing notifications.

Query range for 5-minute cycle metrics

When metrics are sent only once every 5 minutes, if you set the evaluation range exactly to last 5m, data points may fall outside the evaluation range due to transmission timing or Lagoon 3 acquisition timing differences. In such environments, give the query evaluation range a bit more leeway, apart from the notification interval.

Query range: last 10m or last 15m

The thinking remains the same when using SORACOM Harvest Data or Lagoon 3 data sources. Instead of looking only at a width exactly matching the metric transmission interval, specify a range that can absorb deviations in ingestion or evaluation timing.

Examples of unsuccessful configurations

Repeat interval is too long

If the Repeat interval remains at the default long value, re-notifications will not be sent every 5 minutes even if firing persists.

12:00 firing -> Email notification
12:05 firing continues -> No email notification
12:10 firing continues -> No email notification
...
Viewing in a sequence diagram

If you want to shorten reminder notifications during firing, explicitly set the Repeat interval on the Notification policy side.

Group interval is longer than Repeat interval

Even if Repeat interval = 5m, if Group interval = 10m, notification evaluation for the same Alert group may be limited to 10-minute intervals.

12:00 firing -> Email notification
12:05 firing continues -> Potential for no re-notification due to Group interval
12:10 firing continues -> Email re-sent
Viewing in a sequence diagram

If you are aiming for re-notifications every 5 minutes, align the Group interval to 5m as well.

For / Pending period is long

In the case of For / Pending period = 5m, firing does not occur at the first condition match.

12:00 Condition met -> pending, no notification
12:05 Condition continues to be met -> firing, notification
Viewing in a sequence diagram

This is effective for suppressing temporary anomalies but delays the initial notification by one evaluation cycle. If you want to notify from the first condition match as we do here, set it to 0s.

Label does not match Notification policy

If the Label on the Alert rule side does not match the Matching labels on the Notification policy side, it will not route to the intended policy.

Alert rule:
  notify_every_eval=true

Notification policy:
  notify_every_eval=True
Viewing in a sequence diagram

If the case or value is mismatched like this, a different policy or the Root policy is used. Always ensure the Label key and value match perfectly.

Multiple alerts are bundled into an Alert group

Notification policy Timing options are applied to the Alert group. When multiple alerts enter the same Alert group, notifications are bundled by group rather than sent one by one for each alert.

Viewing in a sequence diagram

If you want to handle the target alert separately, check the design of Group by as needed, in addition to Labels and Matching labels.

Summary

If you want to re-notify via email upon every evaluation while firing continues in SORACOM Lagoon 3 / Grafana Alerting, you need to align both the Alert rule and the Notification policy.

Alert rule:
  Evaluate every: 5m
  For / Pending period: 0s
  Labels:
    notify_every_eval: true

Notification policy:
  Matching labels:
    notify_every_eval: true
  Contact point:
    Email
  Override general timings: ON
  Group wait: 1s
  Group interval: 5m
  Repeat interval: 5m

The important point is that even if the condition is met twice in a row, the second time is not treated as a new Alert, but as a continuation of the firing state of the same alert. Therefore, set not only the evaluation interval but also the Repeat interval which controls re-notification while firing continues.

Additionally, because the Repeat interval is evaluated when the Group interval is reset, if you are aiming for re-notification upon every evaluation, align the Group interval with the evaluation interval as well.

GitHubで編集を提案

Discussion