iTranslated by AI
An Anatomy of Google's Zanzibar Authorization System: Resistance is Futile!
Introduction
Notes on the authorization infrastructure supporting various Google services.
This system, named Zanzibar, centrally handles authorization for diverse services such as Calendar, Cloud, Drive, Maps, Photos, and YouTube. Operating on a global scale, it manages trillions of access control lists (ACLs) while handling millions of requests per second. It reportedly achieves response times under 10ms and availability of over 99.999%. Quite impressive.
The following is a summary based on the paper "Zanzibar: Google’s Consistent, Global Authorization System".
Design Principles and Goals
The main goals set when building Zanzibar are as follows:
| Design Goal | Content |
|---|---|
| Correctness | Ensure consistency so that access control works as intended by the user |
| Flexibility | Support various access control policies for both general and business users |
| Low Latency | Since authorization checks are often needed during user operations, respond quickly regardless |
| High Availability | Since access is denied if not explicitly permitted, system downtime can be disastrous |
| Large-scale Scalability | Protect billions of objects shared by billions of users and make them available worldwide |
There are many benefits to centralizing authorization. It provides a consistent user experience and makes integration between apps easier. Access control can be considered when searching across multiple apps, and once difficult consistency problems are solved, the solution can be reused across all apps.
Data Model
The core of Zanzibar is a simple data format called "relation tuples".
<tuple> ::= <object>'#'<relation>'@'<user>
<object> ::= <namespace>':'<object id>
<user> ::= <user id> | <userset>
<userset> ::= <object>'#'<relation>
Using this notation, permissions can be expressed like this:
| Tuple Example | Meaning |
|---|---|
| doc:readme#owner@10 | User 10 is the owner of doc:readme |
| group:eng#member@11 | User 11 is a member of group:eng |
| doc:readme#viewer@group:eng#member | Members of group:eng can view doc:readme |
| doc:readme#parent@folder:A#... | doc:readme is contained in folder:A |
It looks simple, but it allows ACLs and groups to be handled uniformly. Reading, writing, and incremental updates can all be done efficiently.
Another important point here is the definition of <relation>. This is also just a string, but it must be defined in the namespace configuration (described later) before use.
For example, strings like "viewer", "editor", "owner", "member", and "parent" are declared in advance within the namespace settings. You can set not just the names, but also how those relations interact with others (e.g., an "editor" automatically has "viewer" permissions).
A relation is, so to speak, a predicate representing "what kind of relationship exists between this person and this object." With this simple mechanism, it becomes possible to express a wide variety of access controls, such as "User A can edit this document" or "Members of Group B can view this folder." It is well-designed.
Namespace and Relation Configuration
Before using Zanzibar, clients must configure a namespace. In the namespace configuration, relations and storage parameters are specified.
A particularly interesting feature is "userset rewrite rules." These allow for defining connections between relations. For example, you can create a rule where a "document editor automatically becomes a viewer."
Let's look at a simple example of a namespace configuration.
name: "doc" # Define a namespace named "doc"
relation { name: "owner" } # Define an "owner" relation. No special rules.
relation {
name: "editor" # Definition of the "editor" relation
userset_rewrite { # Define special rules for this relation
union { # Meaning "OR". If any condition is met, the user becomes an "editor"
child { _this {} } # Users directly specified as an "editor"
child {
computed_userset {
relation: "owner" # "owner" users automatically become "editor" as well
}
}
}
}
}
relation {
name: "viewer" # Definition of the "viewer" relation
userset_rewrite { # Define special rules for this relation
union { # Meaning "OR". If any condition is met, the user becomes a "viewer"
child { _this {} } # Users directly specified as a "viewer"
child {
computed_userset {
relation: "editor" # "editor" users automatically become "viewer" as well
}
}
child {
tuple_to_userset { # Mechanism for inheriting permissions from the parent folder
tupleset {
relation: "parent" # Look for the "parent" relation (meaning the parent folder)
}
computed_userset { # For the found parent folder
object: $TUPLE_USERSET_OBJECT # Of this parent folder
relation: "viewer" # People with "viewer" permissions also become a "viewer" of this document
}
}
}
}
}
}
With this configuration, owners automatically become editors, and editors automatically become viewers. Additionally, viewers of the parent folder can also see the document. Hierarchical access control is expressed simply.
Consistency Model and Zookie Protocol
An important feature of Zanzibar is its consistency model. In particular, it emphasizes countermeasures against the "new enemy" problem. This problem occurs when ACL update ordering is not maintained or when an old ACL is applied to new content.
| Example Problem | Possible Scenario |
|---|---|
| Ignoring ACL update order | 1. Alice removes Bob from a folder. 2. Alice asks Charlie to move a new document into that folder. 3. If the update order is ignored, Bob might be able to see the new document. |
| Misapplication of old ACLs | 1. Alice removes Bob from a document. 2. Alice asks Charlie to add new content. 3. If judged by the old ACL, Bob might be able to see even the new content. |
To solve this problem, Zanzibar provides two properties: "external consistency" and "bounded staleness." Additionally, it introduces a token called a "zookie."
The flow of the zookie protocol is as follows:
- Before a content change, the client sends a "content-change ACL check" request. No special zookie is required at this stage.
- Zanzibar encodes the current global timestamp into a zookie and returns it. This timestamp is guaranteed to be newer than all preceding ACL writes.
- The client atomically stores the content change along with the zookie in its own storage. This storage operation and the ACL check do not need to be in the same transaction.
- Later, when someone attempts to access that content, the client sends an ACL check request accompanied by that zookie.
- Zanzibar performs the check using a snapshot that is "at least as new as" the timestamp extracted from the zookie.
This ensures that the ordering relationship between ACL and content updates is maintained while providing the flexibility to achieve latency and availability goals.
System Architecture

https://storage.googleapis.com/gweb-research2023-media/pubtools/5068.pdf
Zanzibar's architecture is primarily composed of the following components:
| Component | Role |
|---|---|
| aclservers | The main server type, which handles Check, Read, Expand, and Write requests |
| watchservers | A specialized server type that handles Watch requests |
| Spanner | A global database system that stores ACLs and metadata |
| Offline pipelines | Execute background processes such as creating namespace snapshots |
| Leopard | An indexing system optimized for operations on large and deeply nested sets |
aclservers bear the main burden of processing, receiving requests (Check, Read, Expand, Write) from clients. When a request arrives, the server distributes the necessary processing to other aclservers. For example, when checking a certain group membership, if that group contains other groups, the check processing spreads across multiple servers (described as "fan out" in the paper).
The data itself is stored in a global database called Spanner, where each relation tuple is identified by a primary key: (shard ID, object ID, relation, user, commit timestamp). In other words, it records "who" had "what relationship" to "which object" and "when."
Interestingly, Zanzibar stores multiple versions of tuples in different rows. This mechanism allows ACL checks to be performed at any arbitrary snapshot in the past.
As a countermeasure against hotspots (data where many requests are concentrated), distributed caches are also used between servers. This allows repeated checks of the same data to be processed quickly. Furthermore, there is an indexing system called Leopard, which efficiently processes complex structures such as deeply nested groups.
The general flow of processing can be represented by a sequence diagram like the following (presumably):
Data is stored like this:
| Storage | Content |
|---|---|
| Namespace database | Stores relation tuples for each client namespace |
| Namespace configuration database | Holds all namespace configurations |
| Changelog database | Records changes for all namespaces |
This data is fully replicated across dozens of regions worldwide and distributed among thousands of servers. It's quite a scale.
Performance Optimization
Zanzibar utilizes various technologies to achieve low latency and high availability.
| Optimization Technique | Description |
|---|---|
| Evaluation Timestamp | If a client does not provide a zookie, the system chooses the latest snapshot within a range that does not impact latency. |
| Configuration Consistency | A single snapshot timestamp is chosen for namespace configurations, and all servers in the cluster use that same timestamp. |
| Check Evaluation | ACL checks are converted into boolean expressions for evaluation, and pointer chasing is used to recursively examine indirect ACLs and groups. |
| Leopard Indexing System | A specialized index for efficiently processing deeply nested group memberships. |
| Hotspot Countermeasures | • Distributed caches and lock tables • Increasing cache efficiency with timestamp quantization • Preventing cache stampede issues through parallel requests • Batch reading relation tuples for popular objects • Delaying cancellations when there are waiters in the lock table |
| Performance Isolation | • Per-client CPU usage limits • Per-server limits on outstanding RPCs • Concurrent read limits per object and per client • Different lock table keys for each client |
| Tail Latency Mitigation | • Request hedging to Spanner and Leopard • Dynamically calculated hedging delay thresholds • Multiple replicas deployed in each region |
Thanks to these optimizations, high performance is maintained even in situations where hotspots or delays are likely to occur.
Operational Experience
Zanzibar has been used in production for over five years (as of the writing of the referenced paper), and the number of clients and load have been steadily increasing.
| Scale Metric | Value |
|---|---|
| Number of managed namespaces | Over 1,500 |
| Number of relation tuples | Over 2 trillion |
| Data capacity | Approx. 100 terabytes |
| Replication | 30+ locations worldwide |
| Number of queries processed | Over 10 million per second |
| Number of servers | Over 10,000 |
The characteristics differ depending on the type of request.
| Request Type | Characteristics |
|---|---|
| Safe Request | Carries a zookie older than 10 seconds and can be processed within the region in most cases. |
| Recent Request | Carries a zookie less than 10 seconds old and often requires round trips between regions. |
Looking at the latency for Check Safe requests, 50% are processed in approximately 3ms, 95% in about 11ms, 99% in about 20ms, and 99.9% in about 93ms.
In terms of availability, it has maintained a value exceeding 99.999% over the past three years. It can be said to be very stable.
Lessons Learned
The lessons learned from the development and operation of Zanzibar are summarized as follows:
| Lesson | Details |
|---|---|
| Importance of Flexibility | ・Access control patterns differ entirely depending on the client. ・Features like computed_userset and tuple_to_userset were added to address specific needs.・Freshness requirements are usually relaxed, but there are cases where strictness is required. |
| Necessity of Performance Optimization | ・Request hedging is effective in reducing tail latency. ・Hotspot mitigation is the key to high availability. ・Performance isolation prevents interference between clients. |
These lessons have universal value for building large-scale distributed systems.
Conclusion
As a unified authorization system supporting Google's suite of services, Zanzibar meets the rigorous requirements of correctness, flexibility, low latency, high availability, and large-scale scalability. By combining a simple data model, a powerful configuration language, external consistency, and efficient global distribution, it has realized the capacity to handle trillions of ACLs and millions of requests per second.
We are Google. Resistance is Futile!
Discussion