iTranslated by AI
I created 'hopper', a tool for injecting dummy data into Cloud Spanner
I often find myself needing to quickly insert dummy data into Cloud Spanner, and since I kept writing and throwing away similar Go scripts every time, I decided to build a dedicated CLI tool for it.
While I put a lot of effort into the design and specifications, I had AI write almost all of the code. I repeatedly provided specific instructions on the command structure, desired results, and how to define structures, and then reviewed the output and refined the approach. I hardly had to do any manual work during the implementation phase.
hopper
It reads the schema directly from Spanner, assembles Mutations according to the table definitions, and handles the entire commitment process. It connects using the same DSN format as hammer, a Spanner schema management tool I created in the past.
spanner://projects/{project}/instances/{instance}/databases/{database}
Sample Schema
For explanation purposes, I will use a configuration with users and posts interleaved within it.
CREATE TABLE users (
user_id STRING(36) NOT NULL,
name STRING(MAX) NOT NULL,
email STRING(MAX) NOT NULL,
status STRING(MAX) NOT NULL,
shard_key INT64 NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
) PRIMARY KEY (user_id);
CREATE TABLE posts (
user_id STRING(36) NOT NULL,
post_id STRING(36) NOT NULL,
title STRING(MAX) NOT NULL,
body STRING(MAX) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
) PRIMARY KEY (user_id, post_id),
INTERLEAVE IN PARENT users ON DELETE CASCADE;
Registering users
Pass the number of rows to insert into a table using --table TABLE=N.
$ hopper run spanner://projects/p/instances/i/databases/d --table 'users=10'
users 10 rows
This inserts 10 rows into users. hopper fills in all columns by inspecting their types and names.
-
user_id: A unique value that doesn't collide because it is the primary key. Since it isSTRING(36), it fills with a UUID. -
name,email: Since gofakeit has functions that match the column names, it uses them to insert realistic-looking names and email addresses. -
created_at,updated_at: Random values for TIMESTAMP. -
status,shard_key: Random values based on the data type, as they cannot be guessed.
It automatically fills columns by matching the column name with the gofakeit function name. Values like first_name, email, phone, and country will automatically be natural even without any configuration. If you don't like it, you can make everything random with --no-infer.
Registering users + posts together
If you specify both, it creates the parent (users) first and then the child (posts). The parent key for interleaving is automatically inherited.
$ hopper run --table 'users=10' --table 'posts=100' spanner://...
users 10 rows
posts 100 rows
posts=100 is the total count, and 100 rows are distributed to the 10 users in a round-robin fashion. If you specify users=10 + posts=1000, it will be 100 items per person, and users=10 + posts=300 will be 30 items per person, adjusting according to the ratio.
Creating the parent automatically by specifying only the child
If you specify only posts, the parent users table is detected from the schema and 1 row is automatically created. This is useful when you want to create just one user and attach 100 posts under it.
$ hopper run --table 'posts=100' spanner://...
users 1 rows
posts 100 rows
This works the same way even if you go down to grandchildren; if you specify the lowest table in a 3-level interleave, the upper layers are automatically filled in with 1 row each.
This is not limited to INTERLEAVE; it works the same way for tables referenced by FOREIGN KEY. For example, if you have a schema where the comments table references posts.post_id via a FOREIGN KEY and you pass only --table 'comments=N', posts and users will be automatically filled with 1 row each, and comments.post_id will be automatically linked to the parent posts's post_id. The FK column is picked randomly from the already generated rows of the parent, so referential integrity is maintained. In many cases, it is sufficient to consider only the number of rows at the entry point.
Specifying values for logical shards or enum-like columns with templates
For cases where the column name does not match gofakeit, or for columns like logical shards or enums, you can inject templates using --set TABLE.COLUMN=TEMPLATE. Since the template is the gofakeit template mechanism ({{ }}), you can use gofakeit functions directly.
For example, if you want shard_key to be one of 0-3, you write it like this:
$ hopper run spanner://... \
--table 'users=100' \
--set 'users.shard_key={{ Number 0 3 }}'
Number 0 3 is a function that returns an integer between 0 and 3.
If you want to pick status from the three options: active, banned, or deleted, you can write it like this:
$ hopper run spanner://... \
--table 'users=100' \
--set 'users.status={{ RandomString (SliceString "active" "banned" "deleted") }}'
SliceString creates a string slice, and RandomString returns one randomly from it. Most enum-like columns can be handled with this syntax.
You can chain multiple --set flags, or combine them like {{ FirstName }}-{{ Index }}. Index is the row number (0-indexed), and you can perform arithmetic using add, sub, mul, div, or mod. Other columns in the same row can be accessed via {{ Col "FirstName" }}.
Specifying everything in a configuration file
As the number of --set flags increases, the CLI becomes difficult to manage, so you can consolidate them into a YAML file. You can generate a template with hopper scaffold, so you don't need to write it from scratch.
$ hopper scaffold --table users spanner://... > hopper.yaml
This outputs templates for the specified tables (or all tables if not specified). Columns that match gofakeit function names are pre-filled, and others are listed as # Column: comments, so you just need to uncomment and rewrite the necessary parts.
tables:
users:
rows: 100
columns:
name: "{{ FirstName }} {{ LastName }}"
email: "{{ Email }}"
shard_key: "{{ Number 0 3 }}"
status: '{{ RandomString (SliceString "active" "banned" "deleted") }}'
# created_at:
# updated_at:
posts:
rows: 1000
columns:
title: "{{ Sentence 5 }}"
body: "{{ Paragraph 3 5 10 \" \" }}"
# created_at:
# updated_at:
If you pass this with --config, it will be applied.
$ hopper run --config hopper.yaml spanner://...
By combining it with --table or --set, you can adjust the count based on the YAML, or overwrite only specific columns.
Clearing existing data before insertion
If you add --clear, it will empty the target tables before insertion.
Initially, I implemented a straightforward PartitionedUpdate that issues DELETE FROM ... WHERE TRUE, but for small Spanner instances for development, even the first partition's commit couldn't finish within the PDML ExecuteSql timeout, leading to DeadlineExceeded errors every time. Since the remaining row count wasn't decreasing at all, I changed the strategy.
Ultimately, I switched to a simple implementation that uses chunked deletes, repeatedly performing SELECT pk FROM table LIMIT 100 to get the primary keys and deleting them in 1 commit using the spanner.Delete Mutation until nothing is left. Since each commit is independently finalized, it always makes progress regardless of the data volume.
$ hopper run --table 'users=100' --clear spanner://...
Cleared posts (12000 rows)
Cleared users (1200 rows)
Loading users [████████████████████] 100% (100/100)
users 100 rows
The batch size can be changed with --clear-batch-size. The default is 100. If an error like 'too many mutations' occurs for a parent with many CASCADE children, it automatically halves or quarters the batch size and retries internally, so you don't usually need to adjust it explicitly. However, if it's a simple table without children, increasing it from the start is faster.
$ hopper run --table 'logs=1000000' --clear --clear-batch-size 1000 spanner://...
If you don't add --clear, it inserts using spanner.Insert, so if the PK collides with existing data, it will fail with AlreadyExists.
Checking operation with --dry-run
If you add --dry-run, it will run the generation process without committing anything to Spanner. It displays the first few rows as a sample so you can check the results of your --set settings immediately. Adding -v (verbose) displays the initial seed, so you can reproduce the same data using --seed N.
$ hopper run --table 'users=3' --dry-run -v spanner://...
Seed: 1764285293471232000
users 3 rows
map[created_at:... email:alice@example.com name:Alice shard_key:2 status:active user_id:9e8a...]
map[created_at:... email:bob@example.org name:Bob shard_key:0 status:banned user_id:f1d3...]
map[created_at:... email:carol@example.net name:Carol shard_key:1 status:active user_id:721c...]
Conclusion
It works reliably even with the schemas of real projects. The response to this project is that as long as the design and intent are clearly communicated, things won't break even when having AI write the code, even for scenes like entirely rewriting the --clear strategy halfway through. Part of the goal was to experiment with the development method itself rather than the perfection of the tool. From conception to the current form, it only took about 4 hours of actual work, and I once again realized the power of AI as a tool.
If you find yourself writing disposable code for similar purposes every time, please give it a try.
Bonus
The name is borrowed from Minecraft, and I had ChatGPT create a picture that looks like it for the README.

Discussion