iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
😺

Measuring SLOs on AWS with Amazon CloudWatch Application Signals (Preview)

に公開

Hello everyone.
How are you doing?
I'm Ryo Yoshii, and I love the phrase "No human labor is no human error."

Amazon CloudWatch Application Signals (Preview) was announced during AWS re:Invent 2023.

About Application Signals

What kind of features does the newly released Application Signals have?
I will summarize the User Guide for Application Signals.

  • Collects metrics and traces from applications (via auto-instrumentation agent)
  • Displays key metrics such as call volume, availability, latency, faults, and errors
  • Visualizes application performance goals without needing to build dashboards
  • Enables creation and monitoring of SLOs
  • Automatically creates a service map
  • Allows tracking of SLIs within the service map
  • Integrates with CloudWatch RUM, CloudWatch Synthetics Canaries, and AWS Service Catalog AppRegistry

Supported Languages and Architectures in Preview

As of 2023/12/06, the supported language is Java. JVM 8, 11, and 17 are supported. Since it is in preview, this seems like a stable choice. I expect major languages to be supported by the time it reaches GA.

It is supported and tested on EKS, ECS, and EC2.
While it seems like it might work on other services if the CloudWatch Agent and AWS Distro for OpenTelemetry are running, for now, it appears to be limited to the three services mentioned above.

Supported Regions in Preview

As of 2023/12/06, the supported regions are as follows:

  • US East (N. Virginia)
  • US East (Ohio)
  • Asia Pacific (Sydney)
  • Asia Pacific (Tokyo)
  • Europe (Ireland)

Considerations for Introducing Application Signals

If you have already introduced observability tools to your application, you will need to remove them. If you have manual instrumentation in place, remove it from your code.

Application Signals uses the AWS Distro for OpenTelemetry Java auto-instrumentation agent. If you are currently using OpenTelemetry, you might be hopeful about compatibility, but since compatibility is not guaranteed, it is necessary to remove it as well.

It seems like a good idea to send feedback while trying it out during the preview.

What Makes It Valuable

Measuring SLOs in CloudWatch has not been easy.
While it is easy to display metrics as instantaneous indicators, it requires significant effort to display metrics over a rolling window.
For example, expressing an SLO such as "99.9% or more of the 99th percentile latency must be 100ms or less over the past 30 days" in CloudWatch requires finding a CloudWatch specialist.

With Application Signals, the rolling windows and error budgets necessary for SLO measurement are now displayed by default.
Personally, I think this is a major update. We can now realize modern operations without using commercial tools.

img
https://aws.amazon.com/blogs/aws/amazon-cloudwatch-application-signals-for-automatic-instrumentation-of-your-applications-preview/

(Side Note)
If you want to learn more about SLOs in depth, I recommend the following book:

Trying It Out

I tried it out on a real system.
The instrumentation methods are described in the User Guide.

EKS
ECS
EC2

Dashboard

This is after deploying the application and running access tests for a while.
"Application Signals" has been added to the left pane of the CloudWatch screen.

img


Once you register the services you manage into Application Signals, they can be listed here.
Although the screenshot only shows one service, it is common to monitor multiple services nowadays. I think this is quite useful.

Alt text

Services

Start the application by providing environment variables. There are several environment variables, and the value specified for $SVC_NAME in OTEL_RESOURCE_ATTRIBUTES becomes the service name on the Management Console.

    {
      "name": "OTEL_RESOURCE_ATTRIBUTES",
      "value": "aws.hostedin.environment=$HOST_ENV,service.name=$SVC_NAME"
    }

Service Operations

SLIs for each API are automatically displayed.
Latency (P99/P90/P50), volume (call count), fault rate, error rate, and availability are captured.
Run your operations, or conduct operation-equivalent testing, to determine realistic values for each SLI. Once realistic values are determined, use them to set your SLOs.

Alt text

Dependencies

This shows the dependencies of the application.
Since it uses multiple DynamoDB tables and SNS, it is displayed accordingly.

Alt text

Synthetic Canary

If you configure a Synthetic Canary with X-Ray enabled, the TRACE ID will be associated and displayed here.

Alt text

Client Pages

If you are using CloudWatch RUM, information about client pages is also displayed.
At least, that is what I expect...
Unfortunately, I could not see this screen because RUM did not work correctly for me.

Service Map

It is just like X-Ray. The service map is displayed.
Resources experiencing anomalies are shown in red.

Alt text

Service Level Objectives (SLOs)

This is the screen I am most interested in: SLOs.
It displays the attainment rate, error budget, and SLI. Finally! They have done it. This is what I wanted.

Alt text


The gadgets displayed here can be added to a CloudWatch dashboard.

Alt text

Reducing Collected Signals

:::
message
Added on 2023/12/20
:::

As of the preview release, the pricing for Application Signals is based on the number of signals.
If you do not properly filter the signals you receive, you may incur significant costs.

You can limit the signals received by writing rules in the CloudWatch Agent configuration file.
The points to note for the configuration file are as follows:

  • Write within the logs block. The traces block automatically refers to logs.
  • traces only supports the replace action.

Enable CloudWatch Application Signals
AWS AppSignals Processor for Amazon Cloudwatch Agent

For example, to receive only POST /api/users, write it as follows:

{
  "traces": {
    "traces_collected": {
      "app_signals": {}
    }
  },
  "logs": {
    "metrics_collected": {
      "app_signals": {
        "rules": [
          {
            "selectors": [
              {
              "dimension": "Operation",
              "match": "POST /api/users"
            }
          ],
            "action": "keep",
            "rule_name": "keep01"
        }
      }
    }
  }
}

If you do not want to receive GET requests, write it as follows:

{
  "traces": {
    "traces_collected": {
      "app_signals": {}
    }
  },
  "logs": {
    "metrics_collected": {
      "app_signals": {
        "rules": [
          {
            "selectors": [
              {
              "dimension": "Operation",
              "match": "GET *"
            }
          ],
            "action": "drop",
            "rule_name": "drop01"
        }
      }
    }
  }
}

For Those Who Want to Try It

One Observability Workshop, famous as a textbook for learning Ops on AWS, already has a workshop added.
If you want to experience Application Signals easily, why not give this a try?

Components

Simplifying the components, it looks like this:

img

It is likely that Application Signals is supported from V1.31.1 onwards of the latest aws-otel-java-instrumentation as of 2023/12/06.
Looking at the PR, you can see various related additions.

The receiving end for telemetry data from the ADOT auto-instrumentation agent is the CloudWatch Agent.
A new configuration item called app_signals has been added to the CloudWatch Agent configuration file.
It is app_signals, not the existing xray or otlp.
Personally, I think it would be great if an app_signals receiver were also added to the ADOT Collector to broaden the options. It is tough because the CloudWatch Agent does not accept metrics from the ADOT auto-instrumentation agent.

{
  "traces": {
    "traces_collected": {
      "app_signals": {}
    }
  },
  "logs": {
    "metrics_collected": {
      "app_signals": {}
    }
  }
}

A log group named /aws/appsignals/generic has been created in CloudWatch Logs. Logs that appear to be in EMF format are being sent. The above settings are taking effect.

Summary

This might not be an update that resonates with everyone immediately, but for an Ops-loving person like me, it is a major update.
Implementing SLOs themselves is difficult, and making full use of CloudWatch is not exactly fun work.
I believe Application Signals takes us one or two steps closer to modern design. I look forward to further features being added when it reaches GA.

References

Observe your applications with Amazon CloudWatch Application Signals (Preview)
Amazon CloudWatch Application Signals for automatic instrumentation of your applications (preview)
Four APM features to elevate your observability experience
User Guide Application Signals
aws-otel-java-instrumentation

Announcement

On Monday, December 18, 2023, OpsJAWS, which I run, will hold a study session themed around AWS re:Invent 2023 operational updates.
I will also talk about Application Signals at this event. Please feel free to join us.

Discussion