iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔒

An Anatomy of Google's Zanzibar Authorization System: Resistance is Futile!

に公開

Introduction

Notes on the authorization infrastructure supporting various Google services.

This system, named Zanzibar, centrally handles authorization for diverse services such as Calendar, Cloud, Drive, Maps, Photos, and YouTube. Operating on a global scale, it manages trillions of access control lists (ACLs) while handling millions of requests per second. It reportedly achieves response times under 10ms and availability of over 99.999%. Quite impressive.

The following is a summary based on the paper "Zanzibar: Google’s Consistent, Global Authorization System".

Design Principles and Goals

The main goals set when building Zanzibar are as follows:

Design Goal Content
Correctness Ensure consistency so that access control works as intended by the user
Flexibility Support various access control policies for both general and business users
Low Latency Since authorization checks are often needed during user operations, respond quickly regardless
High Availability Since access is denied if not explicitly permitted, system downtime can be disastrous
Large-scale Scalability Protect billions of objects shared by billions of users and make them available worldwide

There are many benefits to centralizing authorization. It provides a consistent user experience and makes integration between apps easier. Access control can be considered when searching across multiple apps, and once difficult consistency problems are solved, the solution can be reused across all apps.

Data Model

The core of Zanzibar is a simple data format called "relation tuples".

<tuple> ::= <object>'#'<relation>'@'<user>
<object> ::= <namespace>':'<object id>
<user> ::= <user id> | <userset>
<userset> ::= <object>'#'<relation>

Using this notation, permissions can be expressed like this:

Tuple Example Meaning
doc:readme#owner@10 User 10 is the owner of doc:readme
group:eng#member@11 User 11 is a member of group:eng
doc:readme#viewer@group:eng#member Members of group:eng can view doc:readme
doc:readme#parent@folder:A#... doc:readme is contained in folder:A

It looks simple, but it allows ACLs and groups to be handled uniformly. Reading, writing, and incremental updates can all be done efficiently.

Another important point here is the definition of <relation>. This is also just a string, but it must be defined in the namespace configuration (described later) before use.

For example, strings like "viewer", "editor", "owner", "member", and "parent" are declared in advance within the namespace settings. You can set not just the names, but also how those relations interact with others (e.g., an "editor" automatically has "viewer" permissions).

A relation is, so to speak, a predicate representing "what kind of relationship exists between this person and this object." With this simple mechanism, it becomes possible to express a wide variety of access controls, such as "User A can edit this document" or "Members of Group B can view this folder." It is well-designed.

Namespace and Relation Configuration

Before using Zanzibar, clients must configure a namespace. In the namespace configuration, relations and storage parameters are specified.

A particularly interesting feature is "userset rewrite rules." These allow for defining connections between relations. For example, you can create a rule where a "document editor automatically becomes a viewer."

Let's look at a simple example of a namespace configuration.

name: "doc"                     # Define a namespace named "doc"

relation { name: "owner" }      # Define an "owner" relation. No special rules.

relation {
  name: "editor"                # Definition of the "editor" relation
  userset_rewrite {             # Define special rules for this relation
    union {                     # Meaning "OR". If any condition is met, the user becomes an "editor"
      child { _this {} }        # Users directly specified as an "editor"
      child {
        computed_userset {
          relation: "owner"     # "owner" users automatically become "editor" as well
        }
      }
    }
  }
}

relation {
  name: "viewer"                # Definition of the "viewer" relation
  userset_rewrite {             # Define special rules for this relation
    union {                     # Meaning "OR". If any condition is met, the user becomes a "viewer"
      child { _this {} }        # Users directly specified as a "viewer"
      child {
        computed_userset {
          relation: "editor"    # "editor" users automatically become "viewer" as well
        }
      }
      child {
        tuple_to_userset {      # Mechanism for inheriting permissions from the parent folder
          tupleset {
            relation: "parent"  # Look for the "parent" relation (meaning the parent folder)
          }
          computed_userset {    # For the found parent folder
            object: $TUPLE_USERSET_OBJECT  # Of this parent folder
            relation: "viewer"  # People with "viewer" permissions also become a "viewer" of this document
          }
        }
      }
    }
  }
}

With this configuration, owners automatically become editors, and editors automatically become viewers. Additionally, viewers of the parent folder can also see the document. Hierarchical access control is expressed simply.

Consistency Model and Zookie Protocol

An important feature of Zanzibar is its consistency model. In particular, it emphasizes countermeasures against the "new enemy" problem. This problem occurs when ACL update ordering is not maintained or when an old ACL is applied to new content.

Example Problem Possible Scenario
Ignoring ACL update order 1. Alice removes Bob from a folder.
2. Alice asks Charlie to move a new document into that folder.
3. If the update order is ignored, Bob might be able to see the new document.
Misapplication of old ACLs 1. Alice removes Bob from a document.
2. Alice asks Charlie to add new content.
3. If judged by the old ACL, Bob might be able to see even the new content.

To solve this problem, Zanzibar provides two properties: "external consistency" and "bounded staleness." Additionally, it introduces a token called a "zookie."

The flow of the zookie protocol is as follows:

  1. Before a content change, the client sends a "content-change ACL check" request. No special zookie is required at this stage.
  2. Zanzibar encodes the current global timestamp into a zookie and returns it. This timestamp is guaranteed to be newer than all preceding ACL writes.
  3. The client atomically stores the content change along with the zookie in its own storage. This storage operation and the ACL check do not need to be in the same transaction.
  4. Later, when someone attempts to access that content, the client sends an ACL check request accompanied by that zookie.
  5. Zanzibar performs the check using a snapshot that is "at least as new as" the timestamp extracted from the zookie.

This ensures that the ordering relationship between ACL and content updates is maintained while providing the flexibility to achieve latency and availability goals.

System Architecture


https://storage.googleapis.com/gweb-research2023-media/pubtools/5068.pdf

Zanzibar's architecture is primarily composed of the following components:

Component Role
aclservers The main server type, which handles Check, Read, Expand, and Write requests
watchservers A specialized server type that handles Watch requests
Spanner A global database system that stores ACLs and metadata
Offline pipelines Execute background processes such as creating namespace snapshots
Leopard An indexing system optimized for operations on large and deeply nested sets

aclservers bear the main burden of processing, receiving requests (Check, Read, Expand, Write) from clients. When a request arrives, the server distributes the necessary processing to other aclservers. For example, when checking a certain group membership, if that group contains other groups, the check processing spreads across multiple servers (described as "fan out" in the paper).

The data itself is stored in a global database called Spanner, where each relation tuple is identified by a primary key: (shard ID, object ID, relation, user, commit timestamp). In other words, it records "who" had "what relationship" to "which object" and "when."
Interestingly, Zanzibar stores multiple versions of tuples in different rows. This mechanism allows ACL checks to be performed at any arbitrary snapshot in the past.

As a countermeasure against hotspots (data where many requests are concentrated), distributed caches are also used between servers. This allows repeated checks of the same data to be processed quickly. Furthermore, there is an indexing system called Leopard, which efficiently processes complex structures such as deeply nested groups.

The general flow of processing can be represented by a sequence diagram like the following (presumably):

Data is stored like this:

Storage Content
Namespace database Stores relation tuples for each client namespace
Namespace configuration database Holds all namespace configurations
Changelog database Records changes for all namespaces

This data is fully replicated across dozens of regions worldwide and distributed among thousands of servers. It's quite a scale.

Performance Optimization

Zanzibar utilizes various technologies to achieve low latency and high availability.

Optimization Technique Description
Evaluation Timestamp If a client does not provide a zookie, the system chooses the latest snapshot within a range that does not impact latency.
Configuration Consistency A single snapshot timestamp is chosen for namespace configurations, and all servers in the cluster use that same timestamp.
Check Evaluation ACL checks are converted into boolean expressions for evaluation, and pointer chasing is used to recursively examine indirect ACLs and groups.
Leopard Indexing System A specialized index for efficiently processing deeply nested group memberships.
Hotspot Countermeasures • Distributed caches and lock tables
• Increasing cache efficiency with timestamp quantization
• Preventing cache stampede issues through parallel requests
• Batch reading relation tuples for popular objects
• Delaying cancellations when there are waiters in the lock table
Performance Isolation • Per-client CPU usage limits
• Per-server limits on outstanding RPCs
• Concurrent read limits per object and per client
• Different lock table keys for each client
Tail Latency Mitigation • Request hedging to Spanner and Leopard
• Dynamically calculated hedging delay thresholds
• Multiple replicas deployed in each region

Thanks to these optimizations, high performance is maintained even in situations where hotspots or delays are likely to occur.

Operational Experience

Zanzibar has been used in production for over five years (as of the writing of the referenced paper), and the number of clients and load have been steadily increasing.

Scale Metric Value
Number of managed namespaces Over 1,500
Number of relation tuples Over 2 trillion
Data capacity Approx. 100 terabytes
Replication 30+ locations worldwide
Number of queries processed Over 10 million per second
Number of servers Over 10,000

The characteristics differ depending on the type of request.

Request Type Characteristics
Safe Request Carries a zookie older than 10 seconds and can be processed within the region in most cases.
Recent Request Carries a zookie less than 10 seconds old and often requires round trips between regions.

Looking at the latency for Check Safe requests, 50% are processed in approximately 3ms, 95% in about 11ms, 99% in about 20ms, and 99.9% in about 93ms.

In terms of availability, it has maintained a value exceeding 99.999% over the past three years. It can be said to be very stable.

Lessons Learned

The lessons learned from the development and operation of Zanzibar are summarized as follows:

Lesson Details
Importance of Flexibility ・Access control patterns differ entirely depending on the client.
・Features like computed_userset and tuple_to_userset were added to address specific needs.
・Freshness requirements are usually relaxed, but there are cases where strictness is required.
Necessity of Performance Optimization ・Request hedging is effective in reducing tail latency.
・Hotspot mitigation is the key to high availability.
・Performance isolation prevents interference between clients.

These lessons have universal value for building large-scale distributed systems.

Conclusion

As a unified authorization system supporting Google's suite of services, Zanzibar meets the rigorous requirements of correctness, flexibility, low latency, high availability, and large-scale scalability. By combining a simple data model, a powerful configuration language, external consistency, and efficient global distribution, it has realized the capacity to handle trillions of ACLs and millions of requests per second.

We are Google. Resistance is Futile!
https://youtu.be/rtEaR1JU-ps

Discussion