iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🎉

AUTO_INCREMENT vs. UUID: Which Should You Choose for Primary Keys?

に公開

Introduction

In database primary key design, operations vary significantly depending on whether you use AUTO_INCREMENT or UUID.
Since both have clear strengths and weaknesses, I will organize their advantages and disadvantages.

In this article, I will categorize the differences from the perspectives of performance, operation, and development, and summarize the criteria for deciding which one to choose.

What is Required of a Primary Key

A primary key doesn't just need to be unique; it also affects operations and development workflows.
The choice changes depending on which of the following you prioritize:

  • Write performance and index efficiency
  • Ease of integration in distributed environments or across multiple systems
  • Security when exposed externally
  • Ease of investigation and human readability
  • Ease of handling in testing and development

Characteristics of AUTO_INCREMENT

Pros

  • Short values and good index efficiency
  • Since they are sequential, page splits are less likely to occur during insertion, making write performance more stable
  • Easy for humans to handle, facilitating log investigation and manual verification
  • Because the primary key has an order, it is easy to track the creation sequence

Cons

  • IDs are easy to guess, making it easy to reach other data by typing URLs directly
  • Requires design for collision avoidance in multiple databases or distributed environments
  • Numbering strategies must be aligned during migration or merging
  • Tests tend to depend on the state of the numbering, making it difficult to write assertions based on fixed IDs
  • Missing numbers can occur due to ingestion order or retries; specifications that assume continuous sequences are prone to failure

Characteristics of UUID

Pros

  • High uniqueness makes it easy to assign IDs even in distributed environments
  • Since they are not sequential, it is difficult to guess the total count or order when exposed externally
  • Collisions are unlikely during data integration between different systems
  • IDs can be pre-generated without worrying about where they are issued, making them compatible with asynchronous processing

Cons

  • Long values tend to increase index size
  • Highly random UUIDs are prone to B-Tree index fragmentation
  • Low readability makes them difficult to handle during investigations
  • Random IDs make it hard to fix values in tests, making test data preparation tedious
  • Comparisons and searches tend to be slower when handled as strings

Notes on Performance

When using UUIDs, it is more efficient to store them in binary format rather than as strings.
Additionally, time-ordered UUIDv7 is more likely to improve index efficiency compared to completely random types like UUIDv4.

More Specifically on Performance Differences

In current general database implementations, AUTO_INCREMENT tends to be advantageous for both writing and searching.
Reasons are mainly "index continuity due to sequential numbers" and "shorter values."

In the case of AUTO_INCREMENT

  • Since they are inserted in order at the end of the B-Tree, page splits are less likely to occur
  • Because the index stays small, it fits more easily into the memory cache
  • In databases using clustered indexes, the physical placement of data also tends to be continuous

In the case of UUID

  • Random values like UUIDv4 cause insertion points to be scattered, leading to more frequent page splits
  • The index size increases, which can lead to higher I/O during searches
  • Storing as strings increases comparison costs and the burden on secondary indexes

However, there are ways to improve performance even with UUIDs.

  • Use time-ordered formats like UUIDv7
  • Store in binary format instead of strings
  • Use UUIDs only for external exposure while using AUTO_INCREMENT for the internal primary key

In practice, it's easier to decide if you understand that tables with frequent reads and writes favor AUTO_INCREMENT, while UUID is a design that sacrifices some performance in exchange for distributivity.

Practical Examples of Common UUID Performance Issues

Even without first-hand experience, we can explain patterns that are structurally prone to occurring. Here, I will clarify cases often encountered in practice.

Index Bloat from Storing UUIDs as VARCHAR(36)

In InnoDB, the primary key is included in secondary indexes. If the primary key is a UUID string, all secondary indexes will contain that long primary key, leading to the bloat of the entire index structure.

For example, the differences are as follows:

  • A BIGINT primary key is 8 bytes.
  • A UUID string is 36 bytes, and in practice, there is even more overhead.

If you have five secondary indexes, a difference of (36-8) × number of records × 5 is created. As a result, it may not fit within the buffer pool, potentially leading to increased I/O and worsening read latency.

Increased Page Splits with UUIDv4

Because UUIDv4 is highly random, it is more likely to be inserted into middle pages of a B-Tree. This results in more frequent page splits and increased write I/O.

The following is an illustration of insertion positions:

Method Insertion Position
AUTO_INCREMENT Always at the end
UUIDv4 Somewhere in the tree

In high-frequency write tables, write efficiency is more likely to drop compared to AUTO_INCREMENT.

Increased JOIN Costs with String UUIDs

String comparisons have higher CPU costs than integer comparisons, and cache efficiency tends to drop. In queries that make extensive use of large JOINs or GROUP BYs, using UUID strings as keys can suddenly make them heavy.

Cases Where UUIDs "Somehow" Slow Down the System

It is not uncommon to see cases where UUID strings are used as primary keys for no particular reason, even without a need for distributed numbering or external exposure. As a result, the index size grows, and read/write performance may gradually slow down.

Example Improvement Patterns

Even when adopting UUIDs, there are options that minimize performance degradation.

  • Store UUIDs as BINARY(16).
  • Use time-ordered formats like UUIDv7.
  • Use BIGINT for the internal primary key and UUID only for external exposure.

Handling When the Database Supports a UUID Type

In databases that have a dedicated UUID type, it is naturally the first choice. However, since the situation varies by database, you need to consider how to use them differently.

In the case of PostgreSQL

  • A UUID type exists and is handled internally as 16 bytes.
  • Since it is more efficient than strings, the UUID type is the primary candidate if you use UUIDs.

In the case of MySQL

  • There is no native UUID type, so you must choose between CHAR(36) or BINARY(16).
  • While strings are easy to handle, they are prone to index bloat.
  • BINARY(16) is efficient, but requires extra effort for readability and operational handling.

In either database, using a UUID type or BINARY format offers room for improvement. However, it is easier to make a decision if you keep in mind that the performance advantage of sequential AUTO_INCREMENT doesn't completely disappear.

If Exposing Externally, It Is Safer to Have Two IDs

Exposing AUTO_INCREMENT directly in APIs or URLs makes it easy to guess other data based on the sequence. For this reason, in practice, a design that separates the internal primary key from the external identifier is frequently used.

  • Internal ID: Use AUTO_INCREMENT as the primary key.
  • External ID: Have a unique ID that is hard to guess, such as a UUID, in a separate column.

With this configuration, you can maintain the performance benefits of sequential numbering within the database while ensuring security when exposing IDs externally.

Development and Testing Perspectives

Because AUTO_INCREMENT basically lets the database decide the value, it can often be problematic when IDs are not determined during testing. It is especially incompatible with assertions or snapshot tests that assume fixed IDs.

UUIDs also share similar issues because their high randomness makes it difficult to fix values in tests. However, if the generation can be managed on the application side, it is easier to set up a mechanism that generates fixed values only during testing.

Regardless of which one you use, the following measures are effective for testing:

  • Wrap the generation function to return fixed values during tests.
  • Explicitly insert test data for verification.
  • Adopt test designs that do not depend on specific IDs.

Small Operational Differences

One advantage of AUTO_INCREMENT is that it is easy for humans to read and track during troubleshooting or support inquiries. On the other hand, UUIDs become long when displayed in logs or on screens, which increases the likelihood of oversight or typing errors.

Conversely, UUIDs are suitable for data synchronization across multiple environments or for workflows where IDs are issued in advance. The benefits are significant in architectures where APIs operate asynchronously or where multiple services share the same ID.

How to Choose

  • If the system is centered on a single database with high-frequency writes, use AUTO_INCREMENT.
  • If distributed system integration or external public IDs are required, use UUID.
  • If there is external exposure, design with the premise of separating the internal primary key and the public ID.
  • If you prioritize ease of testing and investigation, AUTO_INCREMENT is the safe choice.
  • If you want to issue IDs in advance or share IDs across multiple services, UUID is suitable.

Summary

Choose AUTO_INCREMENT if you prioritize speed and operational simplicity.
Choose UUID if you prioritize distributivity and security during external exposure.

For services with public APIs or URLs, maintaining two systems—an internal primary key and an external public ID—will lead to more stable operations.

Discussion