iTranslated by AI
8 Key Quality Attributes for System Design
Introduction
In design reviews and incident response scenarios, terms like "improving availability" or "low maintainability" frequently come up. I have used them somewhat loosely, but I didn't fully grasp their precise meanings or their differences from other related terms. To organize my own understanding, I have summarized eight quality attributes ending in "-ity" commonly used in system design.
Glossary
① Availability
The percentage of time that a system is up and available for use.
It is expressed by the following formula:
Availability = MTBF / (MTBF + MTTR)
MTBF (Mean Time Between Failures): The average time between failures
MTTR (Mean Time To Repair): The average time from a failure to recovery
In SLAs, this is often expressed as "99.9%" (three nines) or "99.99%" (four nines).
| Availability | Annual Downtime |
|---|---|
| 99.9% | Approx. 8.7 hours |
| 99.99% | Approx. 52 minutes |
| 99.999% | Approx. 5 minutes |
Examples:
- Eliminating single points of failure with a multi-AZ configuration
- Removing failed nodes using a load balancer + health checks
- Shortening MTTR through failure detection and automated recovery
Often confused with: Reliability
Availability is the "percentage of time operational," while Reliability is the "ability to continue operating correctly." Even if failures occur frequently, if recovery is fast, Availability can be kept high, though Reliability would be low.
② Reliability
The ability of a system to continue operating as expected. The resistance to failure.
The longer the MTBF (Mean Time Between Failures), the higher the reliability.
Examples:
- API retry logic and idempotency (ensuring results remain consistent even if the same request is sent multiple times)
- Preventing cascading failures with the circuit breaker pattern
- Data integrity checks during writes
Often confused with: Availability
| Availability | Reliability | |
|---|---|---|
| Question | Is the system available for use? | Is the system operating correctly? |
| Metric | Uptime (%) | MTBF |
| Improvement | Redundancy/Auto-recovery | Reducing bugs/Fault-tolerant design |
③ Durability
The ability to ensure data is not lost. Data persistence.
While Availability refers to whether a system is "usable," Durability refers to whether data "remains intact."
Examples:
- Amazon S3 provides "99.999999999%" (eleven nines) of Durability. Data is replicated across multiple AZs.
- Database WAL (Write-Ahead Logging) ensures data is not lost in the event of a crash.
Often confused with: Availability
Even if an S3 bucket is temporarily inaccessible (reduced Availability), the data itself is not lost (Durability is maintained). Availability and Durability are independent characteristics.
④ Scalability
The ability of a system to expand in response to increased load.
There are two main methods for expansion:
| Method | Description | Example |
|---|---|---|
| Scale-out (Horizontal) | Increasing the number of servers | Adding EC2 instances |
| Scale-up (Vertical) | Increasing server specifications | Changing instance types |
To build a system that scales out easily, stateless design is crucial. If session information is held within the server, issues arise during scale-out, so it is kept in external caches (like Redis).
Examples:
- Auto Scaling to automatically adjust instance counts based on traffic
- Distributing read load with database read replicas
Often confused with: Elasticity
Scalability refers to "having the capacity to expand," while Elasticity refers to "automatically expanding and contracting based on load." Auto Scaling is an implementation example of Elasticity.
⑤ Maintainability
The ability to easily modify, update, and operate a system.
This also overlaps with the perspective of shortening MTTR (recovery time from failure). If code is easy to read and the impact of changes is limited, bug fixes can be performed quickly.
Examples:
- Keeping functions and classes small and clarifying responsibilities
- Communicating the intent of the code through documentation and naming
- Organizing dependencies to localize the impact of changes
Often confused with: Extensibility
Maintainability is "how easy it is to fix existing code," while Extensibility is "how easy it is to add new features." They are similar but look at the system from different perspectives.
⑥ Observability
The ability to observe the internal state of a system from the outside.
There are three pillars of Observability:
| Pillar | Description | Example Tools |
|---|---|---|
| Logs | Recording events | CloudWatch Logs / Datadog |
| Metrics | Numerical time-series data | CloudWatch Metrics / Prometheus |
| Traces | Processing paths of requests | AWS X-Ray / Jaeger |
Often confused with: Monitoring
Monitoring is a mechanism to "detect known problems." An example would be issuing an alert when CPU usage exceeds 80%.
Observability refers to the state where "unknown problems can also be investigated." When a failure occurs, a system with high Observability allows you to combine logs, metrics, and traces to track "why it happened."
⑦ Testability
The ability to easily test a system.
A design with high testability naturally improves Maintainability.
Examples of designs that increase testability:
- Dependency Injection (DI): By passing external dependencies (DB/external APIs) via interfaces, they can be swapped for mocks during testing.
- Separation of side effects: By separating business logic from I/O processes, the logic portion can be unit-tested.
- Small functions: The more a single function focuses on a single task, the simpler the test cases become.
⑧ Extensibility
The ability to add new features without changing existing code.
This corresponds to the OCP (Open/Closed Principle) of the SOLID principles: "Software entities should be open for extension, but closed for modification."
Example:
// Low extensibility: The function must be modified every time a new notification method is added
func Notify(method string, message string) {
if method == "email" {
sendEmail(message)
} else if method == "slack" {
sendSlack(message)
}
// Modify this every time a new method is added
}
// High extensibility: Can be extended simply by adding an interface
type Notifier interface {
Notify(message string) error
}
func SendNotification(n Notifier, message string) error {
return n.Notify(message)
}
Often confused with: Maintainability
| Maintainability | Extensibility | |
|---|---|---|
| Question | Is it easy to fix existing code? | Is it easy to add new features? |
| Focus | Bug fixes/Refactoring | Feature additions/Specification changes |
Trade-offs
Quality attributes can often conflict as trade-offs. Since you cannot maximize everything simultaneously, the essence of design is deciding on priorities based on system requirements.
| Trade-off | Description |
|---|---|
| Availability ↑ vs Consistency ↓ | Redundancy across multiple nodes makes it difficult to maintain constant data consistency between nodes (CAP Theorem) |
| Scalability ↑ vs Maintainability ↓ | Distributed systems scale horizontally easily but increase complexity and become harder to maintain |
| Extensibility ↑ vs Maintainability ↓ | Increasing abstraction for extensibility can make the code more complex and harder to read |
| Observability ↑ vs Performance ↓ | Detailed logging and tracing increase I/O costs and can impact performance |
Conclusion
| Term | Japanese Translation | Definition |
|---|---|---|
| Availability | 可用性 | Percentage of time the system is operational |
| Reliability | 信頼性 | Ability to continue operating as expected |
| Durability | 耐久性 | Ability to ensure data is not lost |
| Scalability | スケーラビリティ | Ability to expand in response to load increase |
| Maintainability | 保守性 | Ability to easily modify and update |
| Observability | オブザーバビリティ | Ability to observe internal state from outside |
| Testability | テスト容易性 | Ability to test easily |
| Extensibility | 拡張性 | Ability to add features without changing existing code |
When these terms come up in design, being conscious of "what to prioritize" and "what the trade-offs are" will deepen the resolution of your discussions.
Discussion