iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🪣

Cost-Effective Long-Term Log Archiving with CloudWatch Logs, Data Firehose, and S3 Lifecycle

に公開

Introduction

I store logs for my personal applications in CloudWatch, but I had set the retention period for the log groups to 180 days. Because of this, logs would disappear without me noticing after 180 days. (Perhaps 180 days is already a long setting.)

I wanted to keep these logs for longer, so I looked into creating a mechanism to archive them in S3. After some research, I decided to use Amazon Data Firehose to stream them into S3.

Broadly speaking, Amazon Data Firehose is a "service that takes incoming data and writes it nicely to S3 in batches." Since AWS handles the troublesome parts—like temporarily buffering data, compressing it, and retrying if a write fails—all I have to do is specify "where and in what format" I want it written.

In this post, I will introduce why I chose Amazon Data Firehose among several methods, how I differentiate S3 storage classes, and how I implemented it using Terraform.

Options Considered

I considered the following three plans.

Plan A: Extend the CloudWatch Logs retention period

This is the simplest approach. You just change the retention_in_days setting. If you set it to 0, it becomes indefinite.

However, CloudWatch Logs storage costs are $0.033/GB/month (in the Tokyo region, per stored size), which is a decent price for storage. It's not a big concern for short-term storage, but since S3 can go down to $0.002/GB/month if you move it to Glacier Deep Archive via S3 lifecycle policies, S3 archiving becomes cheaper the longer you want to keep the data.

Plan B: Periodically export to S3 using Export Tasks

CloudWatch Logs has a feature that allows you to export logs to S3. You can run it manually from the console, or use the provided API to run it periodically as a batch job.

For example, you could set up a configuration where EventBridge periodically triggers a Lambda function to call create-export-task.

However, while researching this, I found the following:

We recommend that you don't use this method to archive logs continuously to Amazon S3. For that use case, we recommend that you use subscriptions instead.

Exporting Log Data to Amazon S3 - Amazon CloudWatch Logs

Because of this, I decided against this plan.

Plan C: Streaming to S3 via Firehose (Adopted)

As cited in Plan B, the method recommended by AWS for continuous archiving was using subscriptions.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html

The beginning of the documentation states:

You can use subscriptions to get access to a real-time feed of log events from CloudWatch Logs and have them delivered to other services such as Amazon Kinesis Data Streams, Amazon Data Firehose, or AWS Lambda for custom processing, analysis, or loading to other systems.

In short, if you configure a subscription filter on a log group, it will forward logs to another service as they arrive. The available destinations include:

  • Amazon Kinesis Data Streams
  • Amazon Data Firehose
  • AWS Lambda
  • Amazon OpenSearch Service

Since my use case is simple—"I just want to stream it to S3 for archiving"—I chose Firehose, which easily connects to S3, and implemented the configuration: "CloudWatch Logs → Firehose → S3".

Overall Architecture

The final configuration looks like this:

I keep CloudWatch Logs as they are while duplicating logs to Firehose using the subscription filter. I use CloudWatch Logs Insights for recent log investigation as before, and only access S3 if I need to look at data older than six months.

S3 Storage Classes and Lifecycle

By utilizing different S3 storage classes, you can reduce costs for data that is accessed infrequently or where you can sacrifice instant retrieval. Since old logs are rarely accessed and a slight delay in retrieval is not a problem, choosing a cheaper storage class is ideal.

Pricing and Use Cases by Class (Tokyo Region)

Class Price/GB/Month Use Case
Standard $0.025 For frequently accessed data (default)
Standard-IA $0.0138 For less frequently accessed data, instant retrieval
Glacier Instant Retrieval $0.005 Accessed a few times a year, instant retrieval
Glacier Flexible Retrieval $0.0045 Backup rarely accessed, minutes-to-hours retrieval
Glacier Deep Archive $0.002 Long-term archive accessed extremely rarely, hours-order retrieval

Costs decrease as you go down, but retrieval takes longer or incurs additional retrieval fees.

Details are provided here:

https://aws.amazon.com/s3/storage-classes/

Storage Classes are Not Bucket-Level

I hadn't paid much attention to storage classes until now, but they can be selected at the object level, not the bucket level.

You can have a mix of Standard / IA / Glacier objects in the same bucket, and even if the class transitions via lifecycle policy, the bucket and key remain the same—only the class attribute attached to that object changes.

s3://my-logs-bucket/2026/01/01/log-001.gz   ← Class: Standard
                ↓ 30 days pass
s3://my-logs-bucket/2026/01/01/log-001.gz   ← Class: Standard-IA (path unchanged)
                ↓ 180 days pass
s3://my-logs-bucket/2026/01/01/log-001.gz   ← Class: Glacier Flexible
                ↓ 365 days pass
s3://my-logs-bucket/2026/01/01/log-001.gz   ← Class: Glacier Deep Archive

I had mentally imagined moving buckets as part of the lifecycle, but it turns out the path remains the same and only the attributes change.

Lifecycle Adopted

Elapsed Days Class Intended Usage
0-30 days Standard Checked often for recent troubleshooting
30-180 days Standard-IA Occasional past investigation, want instant retrieval
180-365 days Glacier Flexible Checked less than once a month, wait a few hours if needed
365 days- Glacier Deep Archive For audit/compliance, not normally seen

Honestly, there's a high probability I won't look at anything after six months, so it might have been fine to move directly to Glacier Deep Archive once placed in S3.

How I wrote it in Terraform

Now for the implementation details. I grouped S3, Firehose, the CloudWatch Logs Subscription Filter, and IAM roles into a single module. Since pasting everything would be long, I will extract only the key points.

S3 Bucket and Lifecycle

resource "aws_s3_bucket" "main" {
  bucket = "${var.common_name}-logs-archive"
}

resource "aws_s3_bucket_public_access_block" "main" {
  bucket = aws_s3_bucket.main.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_lifecycle_configuration" "main" {
  bucket = aws_s3_bucket.main.id

  rule {
    id     = "${var.common_name}-cloudwatch-logs-archive"
    status = "Enabled"

    # Target all objects
    filter {}

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }
    transition {
      days          = 180
      storage_class = "GLACIER"
    }
    transition {
      days          = 365
      storage_class = "DEEP_ARCHIVE"
    }
  }
}

I intentionally did not write expiration. If you don't write it, it won't be deleted automatically, so it will be kept indefinitely.

Firehose Delivery Stream

resource "aws_kinesis_firehose_delivery_stream" "main" {
  name        = "${var.common_name}-logs-archive"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn   = aws_iam_role.firehose.arn
    bucket_arn = aws_s3_bucket.main.arn

    buffering_size     = 128 # MB
    buffering_interval = 900 # seconds
    compression_format = "GZIP"

    # omitted
  }
}

I paid a little attention to buffering_size and buffering_interval.
By default, these are 5MB and 300 seconds.
In other words, when either the 5MB buffering size is reached or the 300-second buffering interval passes (whichever comes first), the accumulated data is written to S3 as a single file.

Since I don't need it that frequently, I extended these.

Reducing the buffer increases real-time performance, but it increases the number of S3 PUT requests and file count. In this case, because:

  • Real-time reference can be handled by CloudWatch Logs,
  • S3 is strictly for long-term archiving,

I set buffering_size to its maximum value of 128MB and buffering_interval to its maximum of 900 seconds. I am allowing for a maximum delay of 15 minutes.

By the way, these limits are documented in the official AWS API Reference.

https://docs.aws.amazon.com/firehose/latest/APIReference/API_BufferingHints.html

CloudWatch Logs Subscription Filter

resource "aws_cloudwatch_log_subscription_filter" "main" {
  for_each = toset(var.log_group_names)

  name            = "${each.value}-to-firehose"
  log_group_name  = each.value
  destination_arn = aws_kinesis_firehose_delivery_stream.main.arn
  role_arn        = aws_iam_role.subscription.arn
  filter_pattern  = ""
  distribution    = "ByLogStream"
}

If you set filter_pattern = "", it will forward everything. Also, since I am using multiple log groups (app main, worker, etc.), I configured it to receive them via for_each.

Conclusion

I had been keeping application logs in CloudWatch Logs, but I thought about where to save them for long-term storage and implemented a solution. As options, I considered extending CloudWatch Logs retention / periodic exports / Firehose, and adopted Firehose due to its simplicity and cost. I feel it is working well after operation.

Furthermore, it was good to learn about S3 storage classes and incorporate them.

Discussion