iTranslated by AI
Achieving Reliable Auto Scaling by Combining Scheduled Scaling and Target Tracking Scaling
Hello.
How are you doing?
I'm Ryo Yoshii, and I love the saying, "No human labor is no human error."
Today, I'm thinking about Application Auto Scaling—or rather, sharing some of my thoughts on it.
Let's say there is an application that has a set peak time—let's assume it's during lunch hours.
We operate with 8 EC2 instances from 10:00 to 14:00, and with 2 instances during other times. However, we want to scale up to 4 instances even during the 2-instance operation period if there is an unexpected surge in access.

I want to solve this case using Application Auto Scaling.
What is Application Auto Scaling?
Application Auto Scaling is a service that provides automatic scaling for AWS resources. It supports the following services:
- AppStream 2.0 fleets
- Aurora replicas
- Amazon Comprehend document classification and entity recognizer endpoints
- DynamoDB tables and global secondary indexes
- Amazon Elastic Container Service (ECS) services
- ElastiCache for Redis clusters (replication groups)
- Amazon EMR clusters
- Amazon Keyspaces (for Apache Cassandra) tables
- Lambda function provisioned concurrency
- Amazon Managed Streaming for Apache Kafka (MSK) broker storage
- Amazon Neptune clusters
- SageMaker endpoint variants
- SageMaker Serverless provisioned concurrency
- Spot Fleet requests
- Custom resources provided by your own applications or services. For more information, see the GitHub repository.
It supports three types of scaling conditions.
In this instance, we will combine "Scheduled scaling" and "Target tracking scaling."
- Target tracking scaling
- Step scaling
- Scheduled scaling
Trying it out
Application Auto Scaling supports a variety of services. For this demonstration, I will use EC2 Spot Fleet, considering the ease of validation and cost-effectiveness.
While there are minor differences, the basic configuration should be the same for other services (at least, that was my assumption during verification).
Application Auto Scaling does not have a management console (as of the date of writing).
Therefore, we will use the AWS CLI or CloudShell to perform the configuration.
Registering as a Scalable Target
I assume you have already created a Spot Fleet.
To scale with Application Auto Scaling, registration as a scalable target is required.
To reiterate, this is an example using EC2 Spot Fleet.
I will use the fleet ID multiple times. Since the ID is a long string, I will store the Spot Fleet request ID in a variable to reduce the chance of errors.
ASRESOURCE=spot-fleet-request/your_request_id
To target the Spot Fleet for Auto Scaling, register it as a scalable target.
At this point, it is fine to set both the minimum and maximum capacity to 0.
aws application-autoscaling register-scalable-target \
--service-namespace ec2 \
--scalable-dimension ec2:spot-fleet-request:TargetCapacity \
--resource-id ${ASRESOURCE} \
--min-capacity 0 \
--max-capacity 0
If you see output like the following, it is successful.
{
"ScalableTargetARN": "arn:aws:application-autoscaling:us-west-2:AWS_ACCOUNT_ID:scalable-target/0ecc0a813d295aa94475ae736666f9bfdbdc"
}
Adding a Scheduled Policy
Next, I will add the scale-out policy.
This refers to the peak period scaling shown in the previous diagram.
We fix it to 8 instances from 10:00 to 14:00.
aws application-autoscaling put-scheduled-action \
--service-namespace ec2 \
--schedule "cron(0 10 ? * MON-FRI *)" \
--timezone "Asia/Tokyo" \
--scheduled-action-name "Scale-Out by cron" \
--resource-id ${ASRESOURCE} \
--scalable-dimension ec2:spot-fleet-request:TargetCapacity \
--scalable-target-action MinCapacity=8,MaxCapacity=8
This is the scale-in part. I have set a minimum of 2 and a maximum of 4 instances. The intention is to operate with a minimum of 2 instances and increase to 4 in case of unexpected access surges.
aws application-autoscaling put-scheduled-action \
--service-namespace ec2 \
--schedule "cron(0 14 ? * MON-FRI *)" \
--timezone "Asia/Tokyo" \
--scheduled-action-name "Scale-In by cron" \
--resource-id ${ASRESOURCE} \
--scalable-dimension ec2:spot-fleet-request:TargetCapacity \
--scalable-target-action MinCapacity=2,MaxCapacity=4
You can check the settings with describe-scheduled-actions.
$ aws application-autoscaling describe-scheduled-actions --service-namespace ec2
Creating a Target Tracking Scaling Policy
Create a JSON file for the policy.
You can easily create policies using predefined variables.
Refer to PredefinedMetricSpecification for what variables are available.
I believe it covers most use cases.
Create a file named cpu50-target-tracking-scaling-policy.json and paste the following.
This configuration sets the threshold at an average CPU utilization of 50% for the Spot Fleet.
{
"TargetValue": 50.0,
"PredefinedMetricSpecification":
{
"PredefinedMetricType": "EC2SpotFleetRequestAverageCPUUtilization"
}
}
Register the scaling policy with put-scaling-policy.
aws application-autoscaling put-scaling-policy --service-namespace ec2 \
--scalable-dimension ec2:spot-fleet-request:TargetCapacity \
--resource-id ${ASRESOURCE} \
--policy-name cpu50-target-tracking-scaling-policy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://cpu50-target-tracking-scaling-policy.json
Check for output similar to the following. The key point is that two alarms are created.
You can also verify the two alarms in the CloudWatch alarm screen (do not delete these).
{
"PolicyARN": "arn:aws:autoscaling:us-west-2:AWS_ACCOUNT_ID:scalingPolicy:6eb96339-8f71-40f1-9c02-e42fd6c1238f:resource/ec2/spot-fleet-request/sfr-535b6fd5-638b-4d34-a330-bc4df1fcf09b:policyName/cpu50-target-tracking-scaling-policy",
"Alarms": [
{
"AlarmName": "TargetTracking-spot-fleet-request/sfr-535b6fd5-638b-4d34-a330-bc4df1fcf09b-AlarmHigh-20fb2620-b86a-4cae-b0bc-12f4db6aae5e",
"AlarmARN": "arn:aws:cloudwatch:us-west-2:AWS_ACCOUNT_ID:alarm:TargetTracking-spot-fleet-request/sfr-535b6fd5-638b-4d34-a330-bc4df1fcf09b-AlarmHigh-20fb2620-b86a-4cae-b0bc-12f4db6aae5e"
},
{
"AlarmName": "TargetTracking-spot-fleet-request/sfr-535b6fd5-638b-4d34-a330-bc4df1fcf09b-AlarmLow-74611c07-f278-4c1b-85bd-5f8916f395fa",
"AlarmARN": "arn:aws:cloudwatch:us-west-2:AWS_ACCOUNT_ID:alarm:TargetTracking-spot-fleet-request/sfr-535b6fd5-638b-4d34-a330-bc4df1fcf09b-AlarmLow-74611c07-f278-4c1b-85bd-5f8916f395fa"
}
]
}
Bonus
You can also define metrics by customizing them yourself. The following example specifies the average CPU utilization of a Spot Fleet.
Refer to CustomizedMetricSpecification for values that can be specified.
{
"TargetValue":50.0,
"CustomizedMetricSpecification":{
"MetricName":"CPUUtilization",
"Namespace":"AWS/EC2Spot",
"Dimensions":[
{
"Name":"FleetRequestId",
"Value":"your_fleet_id"
}
],
"Statistic":"Average",
"Unit":"Percent"
}
}
Putting Load on EC2 Instances
To confirm that target tracking scaling is set correctly, I will put load on the instances to increase CPU utilization.
I will use everyone's favorite yes command.
$ yes >/dev/null &
$ yes >/dev/null &
$ yes >/dev/null &
$ yes >/dev/null &
Running it 4 times should bring it up to 100%.
After waiting a while and checking the scaling activities, it scaled correctly.
"Setting target capacity to 4." is output.
$ aws application-autoscaling describe-scaling-activities --service-namespace ec2
{
"ScalingActivities": [
{
"ActivityId": "843924cd-fb6f-4d4a-bebf-e86741c65138",
"ServiceNamespace": "ec2",
"ResourceId": "spot-fleet-request/sfr-343d8e85-38d0-4681-9f1c-b4a4adff52d4",
"ScalableDimension": "ec2:spot-fleet-request:TargetCapacity",
"Description": "Setting target capacity to 4.",
"Cause": "monitor alarm TargetTracking-spot-fleet-request/sfr-343d8e85-38d0-4681-9f1c-b4a4adff52d4-AlarmHigh-ce43e93b-2df3-4cc7-aca6-8d6c28fc16b9 in state ALARM triggered policy cpu50-target-tracking-scaling-policy",
"StartTime": "2023-10-31T15:44:34.353000+09:00",
"EndTime": "2023-10-31T15:46:42.658000+09:00",
"StatusCode": "Successful",
"StatusMessage": "Successfully set target capacity to 4. Change successfully fulfilled by ec2."
},
Points of Interest
I was curious about the behavior when scaling in from 8 instances to 2 instances at 14:00, so I monitored the process.
Starting from a state of 8 instances running, I used the scheduled policy to scale in (Min 2, Max 4) at 10:45.
Here is an excerpt from the history. (The timestamp is not 14:00, but please disregard that.)
| No. | Timestamp | Event Type | Status | Description |
|---|---|---|---|---|
| (1) | 11/01/2023 10:45:27 AM | fleetRequestChange | modify_in_progress | Modify request received. Requested targetCapacity: 4 |
| (2) | 11/01/2023 11:09:10 AM | fleetRequestChange | modify_in_progress | Modify request received. Requested targetCapacity: 3 |
| (3) | 11/01/2023 11:15:10 AM | fleetRequestChange | modify_in_progress | Modify request received. Requested targetCapacity: 2 |
(1) is scaling in to 4 instances, as specified by cron.
(2) is likely one of the CloudWatch alarms created when the target tracking scaling policy was generated; this is the scale-in alarm.
It seems the rule is to measure every minute and scale in if the average CPU utilization remains below 45% for 15 consecutive times.
The reason for 45% is that the target tracking scaling policy behavior is to scale in when the threshold is consistently below the target value by 10% or more.
Some might point out that it is not exactly 15 minutes after (1) reduced it to 4 instances. However, (1) only received the event to set it to 4, and EC2 termination can take time.
The User Guide states that Application Auto Scaling performs conservative operations during scale-in/scale-out, which might explain why scaling in is delayed to prioritize availability.
Also, I personally believe CloudWatch alarms are not strictly ticking by the second.
At stage (3), it scales in to the minimum of 2 instances. I believe this is also designed not to scale in all at once, considering availability and potential re-scale-out scenarios.
What the approximately 6-minute interval between (2) and (3) means is also speculative, but I assume it includes the scale-in cooldown period of 300 seconds (which varies by service) and factors in EC2 shutdown times and conservative operational logic.
I feel that being conservative in consideration of availability and re-scaling is a rational design.
Perhaps there are cases where the budget might be affected if it doesn't scale out & in at the intended time. In such cases, you might need to build custom solutions using Lambda or similar tools. However, I feel it would be happier to adjust your approach to fit Application Auto Scaling rather than building a custom implementation.
References
Application Auto Scaling User Guide
AWS CLI Command Reference application-autoscaling
Discussion