iTranslated by AI
Building a Database from Scratch in Rust Part 20: Safely Observing DB State with a Read-Only System Metadata API
Last time, we implemented a minimal RENAME TABLE and set up the path to:
- Add
rename_table(old_name, new_name) - Update the catalog entry name and prefix
- Reassign the artifacts belonging to the table prefix
- Make
open_table(old_name)fail after a rename - Maintain the rename result even after a reopen
At this point, the lifecycle as a multi-table database is becoming quite complete:
- create
- open
- list
- drop
- rename
Moreover, for each table, we have:
- snapshot
- WAL generation
- index snapshot
- cleanup
- recovery
Given this, the next natural requirement is a way to safely observe the current state of this DB.
For this purpose, we will add a read-only system metadata API.
The theme for this time is:
Enable safe observation of the DB's internal state without any mutations
Observing metadata was a bit difficult until now
Even with the structure up to last time, we already hold a lot of metadata internally:
- The catalog holds table names and schemas
- The manifest holds the current generation
- There is the existence state of WAL / snapshots / index snapshots
- There is a dirty marker
- The cleanup process manages obsolete generations
However, the surface API to "safely view" these is still weak.
For example, there are many things we want to know:
- How many tables are in this DB right now?
- What is the schema of
users? - Has
ordersbeen checkpointed? - Does
sessionsstill have a dirty marker? - Does the index snapshot exist?
- Do there seem to be obsolete generations remaining?
Until now, to know these things, one had to be very conscious of internal helpers and file layouts.
Therefore, this time, we will add a dedicated API for observation.
We will add a read-only API, not a system table
You might think, "Why not just build a system table?" like information_schema.tables.
However, we will not go down that path this time.
The reason is clear.
If we move toward system tables, several new concerns immediately arise:
- Should we store the system metadata itself in page storage?
- How should we handle WAL / checkpoint / recovery for it?
- How can we avoid double management between the catalog and the system table?
- How should we define the transaction boundary for system table updates?
These are a bit too heavy to tackle right now.
Therefore, this time, we will follow this approach:
First, implement a read-only observation API
This is a quite natural progression.
Overall picture of this implementation
It is clean to divide this metadata API into three layers:
- database-level
- table-level
- artifact-level
Represented in a diagram, it looks like this:
MultiTableDb
├─ inspect_database()
├─ describe_table("users")
└─ inspect_table_artifacts("users")
The role of each is as follows:
inspect_database()
-> Summary of the entire DB
describe_table(name)
-> Logical info like table schema / catalog / manifest
inspect_table_artifacts(name)
-> Physical file status such as snapshot / WAL / index / dirty marker
Dividing into these three layers makes it very easy to use.
What to see at the database-level view
First, we provide inspect_database() as an entry point to view the entire DB.
At a minimum, we want the following information:
- DB root path
- Table count
- List of tables
- Simple summary for each table
- Basic status of the catalog
Represented in a diagram, it looks like this:
DatabaseMetadataView
--------------------
root_path = "./mydb"
table_count = 3
tables:
- users
- orders
- sessions
Of course, you can also add more information for each table:
- Number of columns
- Primary key column name
- Current generation
- Presence of a dirty marker
Having this information makes the overall status of the DB much clearer.
What to see at the table-level view
Next is describe_table(name).
This creates a bundle of the catalog and manifest to return a logical description of that table.
For example, if it is the users table, it looks like this:
TableMetadataView
-----------------
name = "users"
prefix = "users.table"
schema.columns = [id, name, active]
primary_key = "id"
current_gen = 4
has_snapshot = true
has_index_snap = true
dirty = false
The important thing here is that you do not need to run open_table("users") and spin up the runtime. When you only want to see the metadata, you go through a dedicated path.
What to see at the artifact-level view
Deepening the observation one step further is inspect_table_artifacts(name).
This is an observation of the physical state.
- Does the base table file exist?
- Does the manifest exist?
- Does the current snapshot exist?
- Does the current WAL exist?
- Does the current index snapshot exist?
- Does the dirty marker exist?
- Are there any orphan temporary files?
- Do any obsolete generations seem to remain?
If you visualize it, it looks like this:
TableArtifactView
-----------------
table_file = exists
manifest = exists
snapshot(current) = exists
wal(current) = exists
index_snapshot = exists
dirty_marker = missing
obsolete_generations = 1
orphan_temps = 0
This is quite convenient.
The key is that "inspect does not mutate"
The most important constraint this time is this:
The inspection API must not cause any mutation.
In other words:
- It does not clean up.
- It does not perform recovery.
- It does not perform checkpoints.
- It does not append to WAL.
This is crucial.
If the inspection were to start recovery or cleanup on its own, the system's state would change just because you wanted to look at it. That is quite dangerous.
Therefore, in this implementation, inspection focuses strictly on observation.
Even if there is a dirty marker, inspection does not recover
This point should be made particularly clear.
For example, suppose a users.table.dirty file remains. If you call describe_table("users") at that moment, it must not automatically run a recovery.
In this implementation, inspection means:
"See the current state as is"
It does not mean:
"See it after fixing it to the correct state"
Therefore, if a dirty marker exists, it is simply reported in the view.
dirty_marker = true
This decision is important.
Reusing existing helpers is essential
The most important thing when implementing this inspection API is not to re-implement catalog / manifest / artifact inventory management.
The existing sources of truth are as follows:
- catalog: table name / schema / prefix
- manifest: generation information
- artifact inventory: file existence and cleanup perspective
Therefore, the inspection API will simply bundle these together.
The high-level design is as follows:
catalog --------\
manifest --------> metadata inspection view
artifact scan ---/
This makes the responsibilities very clean.
Difference from list_tables()
It is better to clearly separate these as well:
-
list_tables()is just a list of names. -
describe_table()is logical metadata. -
inspect_table_artifacts()is physical artifact state. -
inspect_database()is a summary of the entire DB.
In other words, list_tables() will remain. Use the new API only when you need metadata.
This separation is important. If you crowd everything into a heavy metadata view, even the simple case of wanting a list of names becomes unnecessarily complex.
Sample Code: The API surface
From the outside, you can make it look quite straightforward.
let db = MultiTableDb::open("./mydb")?;
let overview = db.inspect_database()?;
println!("tables = {}", overview.table_count);
let users = db.describe_table("users")?.unwrap();
println!("users pk = {}", users.primary_key_name);
let users_artifacts = db.inspect_table_artifacts("users")?.unwrap();
println!("users dirty = {}", users_artifacts.dirty_marker_exists);
The benefit of this API is that you can view information without opening a runtime handle.
Image of inspect_database()
Conceptually, the code looks like this:
pub fn inspect_database(&self) -> Result<DatabaseMetadataView, DbError> {
let tables = self.catalog
.entries()
.iter()
.map(|entry| self.summarize_table(entry))
.collect::<Result<Vec<_>, _>>()?;
Ok(DatabaseMetadataView {
table_count: tables.len(),
tables,
})
}
What is important here is that it only returns a metadata summary of the entire DB, and does not open table handles or trigger recovery.
Image of describe_table(name)
This is more per-table.
pub fn describe_table(&self, name: &str) -> Result<Option<TableMetadataView>, DbError> {
let entry = match self.catalog.get_table(name) {
Some(entry) => entry,
None => return Ok(None),
};
let manifest = try_load_manifest_for_prefix(&entry.prefix)?;
let artifact_view = self.inspect_table_artifacts(name)?;
Ok(Some(TableMetadataView {
name: entry.name.clone(),
schema: entry.schema.clone(),
artifact_prefix: entry.prefix.clone(),
manifest_generation: manifest.as_ref().map(|m| m.current_generation),
artifact_view,
}))
}
The role of this API is to summarize the logical view of the catalog + manifest.
Image of inspect_table_artifacts(name)
This returns the existence state of physical files.
pub fn inspect_table_artifacts(&self, name: &str) -> Result<Option<TableArtifactView>, DbError> {
let entry = match self.catalog.get_table(name) {
Some(entry) => entry,
None => return Ok(None),
};
let inventory = scan_artifacts_for_prefix(&entry.prefix)?;
Ok(Some(TableArtifactView {
table_file_exists: inventory.table_file_exists,
manifest_exists: inventory.manifest_exists,
snapshot_exists: inventory.current_snapshot_exists,
wal_exists: inventory.current_wal_exists,
index_snapshot_exists: inventory.current_index_exists,
dirty_marker_exists: inventory.dirty_exists,
orphan_temp_count: inventory.orphan_temp_count,
}))
}
This also performs no mutations whatsoever. It is strictly for "looking and returning."
How to handle a malformed manifest
This requires a design decision.
For example, if the catalog is fine but the manifest is corrupted, what should describe_table("users") do?
As a minimal design, it is natural for a manifest parse failure to be an inspection failure.
The reason is simple: the manifest is the source of truth for the generation. If that is broken, returning "some metadata" would actually make things more ambiguous.
On the other hand, if only part of the artifacts are missing, it is fine to return them as a view.
- Snapshot missing
- Index snapshot missing
- Dirty marker present
States like these are actually candidates for observation.
Inspection returns "the state as it is"
This is easy to understand with a diagram.
dirty table state
-----------------
users.table.manifest -> exists
users.table.snapshot -> exists
users.table.wal -> exists
users.table.dirty -> exists
inspect result
--------------
dirty_marker_exists = true
In other words, inspection returns:
- "what is here right now"
Not:
- "what it is after fixing it"
This stance is important.
Why is a metadata API valuable?
Now the value of the metadata API is becoming quite clear.
For example, it will be very useful for future needs:
- Want to list tables via CLI
- Want to display schema
- Want to visualize dirty state
- Want to see the difference before and after cleanup
- Want to perform operational health checks
However, we won't preemptively support all those use cases this time. First, we establish being able to safely observe as an API.
The end-to-end sample looks great
An example that is easy to show in this article is one that looks at the metadata of multiple tables at once.
let mut db = MultiTableDb::open("./mydb")?;
db.create_table(users_schema())?;
db.create_table(orders_schema())?;
{
let mut users = db.open_table("users")?;
users.insert(user_row(1, "alice"))?;
users.checkpoint()?;
}
let overview = db.inspect_database()?;
assert_eq!(overview.table_count, 2);
let users = db.describe_table("users")?.unwrap();
assert_eq!(users.name, "users");
assert_eq!(users.schema.columns.len(), 3);
let users_artifacts = db.inspect_table_artifacts("users")?.unwrap();
assert!(users_artifacts.manifest_exists);
assert!(users_artifacts.snapshot_exists);
assert!(users_artifacts.index_snapshot_exists);
The good parts of this code are:
- You can view metadata without touching runtime data.
- You can observe the artifact state after a checkpoint.
- You can see the big picture in a multi-table world.
Inspection after rename / drop is also natural
With this API, state verification after rename or drop is also quite natural.
db.rename_table("users", "customers")?;
assert!(db.describe_table("users")?.is_none());
let customers = db.describe_table("customers")?.unwrap();
assert_eq!(customers.name, "customers");
db.drop_table("orders")?;
assert!(db.describe_table("orders")?.is_none());
In other words, the inspection API is also a natural entry point for confirming results after mutations.
Things we won't do yet
Writing this down clearly concludes the article.
Things we are not doing this time include:
- Making the system table backed by page storage
- SQL
SHOW TABLES - SQL
DESCRIBE - Statistics collector
- Dependency graph
- Online validation
- ALTER TABLE
- Catalog transactions
Why not now?
Because what is needed this time is:
A minimal and safe surface API for observation
In other words, what we want now is a minimal contract that allows:
- Viewing the whole DB
- Viewing per table
- Viewing artifact states
- And crucially, no mutation
Separation of concerns this time
The separation of concerns revealed by this system metadata API is quite clean.
catalog
└─ source of truth for table existence / schema / prefix
manifest
└─ source of truth for generation
artifact inventory
└─ source of truth for file state
metadata API
└─ bundles the 3 above and returns a read-only view
What is especially important is that:
- Inspection does not clean up
- Inspection does not recover
- Inspection does not checkpoint
By separating observation from mutation, it becomes much easier to expand in the future.
Impressions
This task wasn't about adding flashy mutations. But I think it is a very important step.
The internal structure of the DB has become quite complex:
- catalog
- manifest
- snapshot
- WAL
- index snapshot
- cleanup
- dirty marker
Once in this state, development and operation become difficult without an API to observe it safely.
It is great that we could organize this by keeping it read-only, reusing sources of truth, and separating it from mutations.
Especially, setting it up as an API before making it a system table is a very good order of operations.
Conclusion
This time, we introduced a read-only system metadata API, added:
inspect_databasedescribe_tableinspect_table_artifacts
and made it possible to safely observe the state of a multi-table DB.
In short, it was an episode for safely "visualizing" the state of the components we have built so far:
- catalog
- manifest
- WAL
- snapshot
- index snapshot
- cleanup
With this, our custom DB doesn't just have features, but is becoming a DB that can explain its current state from the outside.
Next, the natural theme will be to add a CLI or management command layer that returns system metadata on top of this foundation. Now that we can safely observe it as an API, we can move on to the stage of making it accessible to humans.
See you next time!
Discussion