iTranslated by AI
Building a Database from Scratch in Rust Part 8: Graduating from Full Scans — Implementing a Minimal Primary Key Index
Last time, we introduced a file-backed TableStorage and progressed to the point where we can:
- insert rows
- get rows by
RowId - read all rows with
scan() - read data even after reopening
This was a major step forward. However, once you get to this point, you quickly feel a limitation.
It is painful to have to scan every time to find a row
For example, if you want to get the "user with id = 42," you currently have no choice but to check all pages and all slots in order.
So, this time, we will introduce a minimal primary key index.
Our goal this time is simple:
- Extract the primary key from a row
- Map it to a
RowId - Enable retrieving rows via
get_by_key(key)
In short, the theme for this time is:
Implement a DB-like read path of
key -> RowId -> Rowwith minimal configuration
The Structure Until Last Time
As of last time, the structure looked roughly like this:
Row
|
v
row codec
|
v
Page
|
v
Pager
|
v
File
TableStorage
├─ insert(row)
├─ get(RowId)
└─ scan()
In other words, you could save a row and read it if you knew the RowId, or find it by traversing everything with scan(). However, the layer to "search by key" did not exist yet.
The Big Picture for This Time
We are now adding a new PrimaryIndex:
+----------------------+
| PrimaryIndex |
| key -> RowId |
+----------------------+
|
v
+-----------+ +----------------------+ +----------------------+
| Row | --> | IndexedTableStorage | --> | TableStorage |
+-----------+ | - insert(row) | | - insert/get/scan |
| - get_by_key(key) | +----------------------+
| - rebuild_index() | |
+----------------------+ v
Page / Pager / File
From now on, the structure will be:
-
TableStoragefor persistence -
PrimaryIndexfor search routing - A thin facade to connect them
What is important here is that the index does not know about files or pages. The index focuses strictly on key -> RowId.
Why an In-Memory Index at First?
It is natural to think, "Since I'm at it, I want to make the index persistent too." But we won't go there this time.
The reason is simple: what we need right now is:
The initial contract for key lookup
In other words, what we want first is a minimal read path that can:
- Extract a primary key
- Detect duplicates
- Resolve a
RowIdfrom a key - Read a row from a
RowId
If we tried to go straight to index persistence, we would have to tackle all of these at once:
- Index page layout
- Index file format
- Index rebuild
- Consistency during a crash
- Synchronization with the table file
That is too heavy to do right now. So, keeping it in-memory at first is the natural choice.
How to Represent PrimaryKey
This is a subtle but important design point.
One could use db-core::Value directly as a key, but it is cleaner to avoid that initially. The reason is that the types we want to allow as primary keys are not necessarily identical to the types that can be held as general values.
For example, in this minimal configuration, these restrictions are natural:
-
Int64is allowed -
UInt64is allowed -
Boolis allowed -
Stringis allowed -
Bytesis allowed -
Float64is not allowed -
Nullis not allowed
In short, it is easier to understand if we have an explicit type for the primary key.
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub enum PrimaryKey {
Int64(i64),
UInt64(u64),
Bool(bool),
String(String),
Bytes(Vec<u8>),
}
By using this form, we gain the following advantages:
- Since it has
Ord, it can be used in aBTreeMap -
Float64can be excluded from the start -
Nullcan be excluded from the start
Why Exclude Float64?
There are technically situations where you might want to use Float64 as a key, but it is safer to exclude it for the first primary key implementation. There are several reasons for this:
- Handling NaN is difficult
- It does not mesh well with total ordering
- The meaning of comparisons can sometimes deviate from intuition
In the stage of building the first "easy-to-understand primary key index," it is cleaner design not to force it here.
Why BTreeMap?
HashMap is also a candidate for the first index implementation, but BTreeMap is quite natural for this purpose. The reasons are:
- Easy to handle based on
Ord - Easily expandable to range queries in the future
- Stable and easy to explain
- Fast enough for a first index
In other words, this time, the decision is:
6 Choose a minimal structure that is easier to expand upon, rather than aiming for top speed
extract_primary_key Is the Entry Point This Time
Before putting a row into the index, this is what is needed first:
Row + TableSchema
|
v
extract_primary_key(...)
|
v
PrimaryKey
This is the "entry point for converting a logical row into an index key" for this iteration.
The sample code looks something like this:
pub fn extract_primary_key(schema: &TableSchema, row: &Row) -> Result<PrimaryKey, DbError> {
row.validate(schema)?;
let pk_index = schema.primary_key_index;
let value = &row.values[pk_index];
match value {
Value::Int64(v) => Ok(PrimaryKey::Int64(*v)),
Value::UInt64(v) => Ok(PrimaryKey::UInt64(*v)),
Value::Bool(v) => Ok(PrimaryKey::Bool(*v)),
Value::String(v) => Ok(PrimaryKey::String(v.clone())),
Value::Bytes(v) => Ok(PrimaryKey::Bytes(v.clone())),
Value::Null => Err(DbError::InvalidRow("primary key cannot be null".into())),
Value::Float64(_) => Err(DbError::InvalidRow("float64 is not supported as primary key".into())),
}
}
Although it looks simple, this clearly defines:
- Which types are allowed
- Which types are excluded
- How to follow the schema
PrimaryIndex Itself Can Be Surprisingly Thin
The PrimaryIndex for this iteration can be extremely thin.
pub struct PrimaryIndex {
map: BTreeMap<PrimaryKey, RowId>,
}
A minimal API is sufficient:
insert(key, row_id)get(&key)len()clear()
What is more important is "what it does not hold." The PrimaryIndex here does not contain:
- File I/O
- Page information
- Entire schema
- Row codec
- Scan
- Durability
In other words, it is a conscious decision that the index is nothing more than a reference table.
key -> RowId -> Row in Diagrams
What is most important this time is this single path:
PrimaryKey
|
v
PrimaryIndex
|
v
RowId(page_id, slot_id)
|
v
TableStorage::get(RowId)
|
v
Row
Once this form is established, it becomes much more like a database.
For example, the flow to search for id = 42 looks like this:
42
|
v
PrimaryKey::UInt64(42)
|
v
PrimaryIndex -> RowId { page_id: 3, slot_id: 1 }
|
v
TableStorage::get(...)
|
v
Row { id: 42, name: "alice", ... }
At this point, "where the row is saved" and "how to find it" are separated. This is quite significant.
What Happens During an insert?
The flow during an insert becomes:
Row
|
v
extract_primary_key
|
v
duplicate check on PrimaryIndex
|
v
TableStorage::insert(row)
|
v
RowId
|
v
PrimaryIndex::insert(key, row_id)
Visualized as a diagram, it looks like this:
+--------+
| Row |
+--------+
|
v
+-------------------+
| extract key |
+-------------------+
|
v
+-------------------+
| duplicate check |
+-------------------+
|
v
+-------------------+
| storage insert |
+-------------------+
|
v
+-------------------+
| RowId |
+-------------------+
|
v
+-------------------+
| index insert |
+-------------------+
This order is quite important:
- Create the key first
- Verify duplicates
- Write to storage
- Finally, update the index
By doing this, at least "obvious duplicates" can be blocked before they reach the storage.
Sample Code: insert
Conceptually, it looks like this:
pub fn insert(&mut self, row: &Row) -> Result<RowId, DbError> {
let key = extract_primary_key(&self.schema, row)?;
if self.index.get(&key).is_some() {
return Err(DbError::KeyAlreadyExists("duplicate primary key".into()));
}
let row_id = self.storage.insert(row)?;
self.index.insert(key, row_id)?;
Ok(row_id)
}
The good thing about this code is that its responsibilities are very clear:
- Key extraction is the job of the key extractor
- Duplicate checking is the job of the index
- Persistence is the job of the storage
- Linking is the job of the index
Each layer does only its own job.
Sample Code: get_by_key
get_by_key is quite straightforward:
pub fn get_by_key(&mut self, key: &PrimaryKey) -0 Result0Option0Row0, DbError0 {
let row_id = match self.index.get(key) {
Some(row_id) =0 row_id,
None =0 return Ok(None),
};
self.storage.get(row_id)
}
With this in place, we can graduate from the state of "having to scan everything to find a row."
Of course, since this is still an in-memory index, a rebuild is required after restarting. However, it still holds significant meaning as a read path.
How to Handle Reopening
Here, the important point is that the index has not yet been persisted.
In other words, only the table storage side remains in the file.
persisted:
- table file
- page contents
- rows
not persisted yet:
- primary index
So how do we handle this after reopening?
Initially, this is sufficient:
scan()to rebuild the index
Represented in a diagram:
reopen table
|
v
scan all rows
|
v
extract primary key from each row
|
v
rebuild BTreeMap<PrimaryKey, RowId>
This might be slow. However, what we need right now is the "initial correct contract." Fast index rebuilding comes later.
Sample Code: rebuild_index
pub fn rebuild_index(&mut self) -> Result<(), DbError> {
self.index.clear();
for (row_id, row) in self.storage.scan()? {
let key = extract_primary_key(&self.schema, &row)?;
self.index.insert(key, row_id)?;
}
Ok(())
}
If any duplicate keys are mixed in, they can be detected here. This is also important.
In other words, a rebuild is not just a reconstruction; it also serves as a primary key integrity check for the storage side.
An End-to-End Sample Looks Quite Impressive
let schema = user_schema();
let mut table = IndexedTableStorage::open(&path, schema.clone(), 8192)?;
// insert
table.insert(&Row {
values: vec![
Value::UInt64(1),
Value::String("alice".into()),
Value::Bool(true),
Value::Null,
],
})?;
table.insert(&Row {
values: vec![
Value::UInt64(2),
Value::String("bob".into()),
Value::Bool(false),
Value::String("memo".into()),
],
})?;
// key lookup
let row = table.get_by_key(&PrimaryKey::UInt64(2))?.unwrap();
assert_eq!(row.values[1], Value::String("bob".into()));
// reopen + rebuild
drop(table);
let mut reopened = IndexedTableStorage::open(&path, schema, 8192)?;
reopened.rebuild_index()?;
let row_again = reopened.get_by_key(&PrimaryKey::UInt64(1))?.unwrap();
assert_eq!(row_again.values[1], Value::String("alice".into()));
The good part about this sample is that you can see everything:
- insert
- key lookup
- reopen
- rebuild
- re-lookup
It really gives you the feeling of "building the read path of a small DB yourself."
What We Are Not Doing Yet
Emphasizing this actually makes the article tighter.
Things we are not doing this time include:
- Index persistence
- Secondary indexes
- Full-fledged range query API
- Planner integration
- Design for full synchronization between delete and index
- Compaction and index updates
- WAL / recovery and index consistency
- Concurrent index updates
Why not now?
Because what we need right now is:
The minimal primary key read path
We need the initial contract of being able to:
- Create keys
- Find by keys
- Rebuild after reopening
Responsibility Separation This Time
The responsibility separation that becomes quite clear in this installment is as follows:
db-core::Row / TableSchema
└─ Logical row and schema
PrimaryKey extractor
└─ Create a key from a row
PrimaryIndex
└─ key -> RowId
TableStorage
└─ RowId <-> Row
(insert / get / scan)
Indexed facade
└─ Connects storage and index
What is particularly important is:
- The index does not know about pages
- The storage does not know about keys
- The facade remains a thin bridge
With this separation, the structure is less likely to collapse when adding index persistence or secondary indexes later.
Impressions
I think this installment introduces quite a bit of "DB-like searching."
Up until the last time, we were at the stage of:
- Being able to save
- Being able to read
- Being able to scan
This time, we took a step forward, and a more database-like read path appeared:
Find by key and read the row
And the good thing is that we didn't make it too complicated from the start:
- in-memory index
BTreeMap- rebuild on reopen
PrimaryKey -> RowId -> Row
It's quite simple, but for that very reason, the core of the design is easy to see.
Personally, I really like this feeling of "graduating from a scan-only world." It's small, but there is a clear sense of moving forward.
Summary
This time, we introduced a minimal primary key index to perform the following tasks:
- Extract a primary key from a row
- Maintain
key -> RowId - Establish
get_by_key(key) - Rebuild via
scan()after reopening
In short, this is the installment where we add the first "searching mechanism" on top of our storage.
The natural next steps look something like this:
- Index persistence
- Delete/update and index synchronization
- Simple CRUD from the API layer
- Write ordering with an eye toward WAL
Little by little, our table storage has evolved from a "place to save" to a "place where you can find and read."
Next, it would be natural to prepare an easy-to-use CRUD entry point from the API layer to make this primary key read path even more practical.
For example:
- create/open table
- insert
- get by key
- scan
- simple delete
Now that the storage and index are in place, it might be the time to move toward a surface API regarding "how to use it from the outside."
This was a long one, but please stay tuned for the next installment!
Summary Diagram for the End of the Article
Before T-008
------------
scan() -> find RowId -> read Row
After T-008
-----------
PrimaryKey -> RowId -> Row
Discussion