iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
👻

Cloudflare D1 is Incredible

に公開

I haven't tested it enough yet, but if it's truly what I imagine it to be, my soul might just tremble...

Announcing D1: our first SQL database

Cloudflare D1 = Edge SQLite

Cloudflare D1 is a Cloudflare Worker, meaning SQLite runs on the CDN network. If it were just that, it would be ordinary SQLite hosting, but since it's from Cloudflare, it's more than that: it's SQLite with Read Replicas distributed across the CDN edge. Isn't that insane? I thought it was insane.

To understand this insanity, you need to know a few things about the infrastructure Cloudflare has developed.

Durable Objects allow you to build an Actor model on the CDN. These Actors are strongly consistent and are described as storage where the actual entity moves to regions with high access.

About Cloudflare Workers' Durable Objects

When I translated the section about Durable Objects in the article above, this part caught my eye:

We think of Durable Objects as a low-level primitive for building distributed systems. Some of the applications above might use an object directly to implement a coordination layer, or as their sole storage layer.
However, Durable Objects today are not a complete database solution. Each object can only see its own data. To perform queries or transactions across multiple objects, an application needs to do some extra work.
In other words, all large-scale distributed databases (relational, document, graph, etc.) are composed of low-level pieces. In a large-scale distributed database, the low level consists of "chunks" or "shards" that store a piece of the overall data. The job of a distributed database is coordination between chunks.
We see a future of edge databases where each "chunk" is stored as a Durable Object. In doing so, it will be possible to build a database that has no region or home location and runs entirely at the distributed edge. We don't have to be the ones to build such a database. Anyone can build on top of Durable Objects. Durable Objects are just the first step in the journey of edge storage.

This seems to be exactly what D1 is.

Presumably, it uses the same mechanism as Durable Objects to identify hotspots on the CDN edge and dynamically move the master node across the CDN edge.

With D1, even individuals can (potentially) beat large-scale clouds at a low cost

The biggest feature of D1 is likely the cost. By replicating between CDN edges, you can affordably obtain the geographic distribution that was previously achieved with things like Spanner or Cockroach DB. These are optimized on top of Cloudflare's infrastructure.

By the way, an article like this was buzzing recently:

The cost of personal development depends on the DB - laiso

Depending on the target and scale, clouds used for web-related business often adopt things that scale massively using plenty of expensive features.

With Spanner, Aurora Serverless, Kubernetes, etc., nodes stay awake for hot standby, so pricing often starts at at least 6,000 yen/month. Of course, once you use them, the price explodes quickly, reaching a point where an individual can't sustain it.

I don't think this situation is very healthy. For static sites, things like Netlify, Cloudflare Pages, and Firebase Hosting make it cheap, but for DBs, hobby-level projects always struggled with price and scaling.

In the midst of all this, Litestream, which replicates SQLite, became a hot topic. People around me were starting to experiment with whether they could keep costs down by replicating SQLite instances between S3 buckets using this.

https://zenn.dev/voluntas/scraps/f4939cbe92525c

I was also thinking about whether I could try this configuration with Cloudflare R2 + Litestream, but Cloudflare D1 is a further optimization of that. I'm glad it came out before I started...

Anticipated Bottlenecks

Presumably, as the volume of SQLite data increases, this architecture will face bottlenecks in replication speed and scans for query execution.
Therefore, it is likely best suited for scenarios where a relatively small amount of dynamic data needs to be rapidly deployed across the CDN.

For large-scale use, you might need strategies like predicting hotspots and offloading low-access data to other storage while keeping specific portions on D1.
However, this might be resolved if D1 becomes capable of distributing data at the SQLite table level.

Let's do this

So, I've written this so far just by looking at the specs without actually touching it. I'm going to go try it out now.
Everyone, try it out for yourselves, run some benchmarks, and see what it's like!

Discussion