iTranslated by AI
Tracing the Origins of ReadableStream's Single-Read Limitation from the Kernel Up
Introduction — The Day the Response Disappeared
While implementing idempotency middleware for a POS API, I encountered a strange bug. When sending the same request twice, the second response would return {"note": "[missing response body]"}. Upon investigating, I found that the Response.body ReadableStream had become empty after being read once.
const res = await fetch("https://api.example.com/data");
const first = await res.json(); // OK
const second = await res.json(); // TypeError: body used already
Why can the Response body only be read once?
In this article, I will delve into everything from kernel socket buffers to the design philosophy of Web standards to answer this question. By the time you finish reading, you should understand why "only being able to read once" is a rational design choice.
Target Audience: Web engineers who have used fetch or Response but have never wondered why streams can only be read once. This article assumes basic knowledge of JavaScript/TypeScript and the concepts of HTTP requests/responses.
What is a ReadableStream?
In short, a ReadableStream is an API for reading data sequentially by dividing it into small chunks. It is used in scenarios where "data arrives bit by bit," such as file downloads, video streaming, and HTTP responses.
The Three States of a Stream
ReadableStream has three main states:
- readable: The state where data is available for reading. The stream is in this state immediately after creation.
- closed: The state after all data has been read. No data remains.
- errored: The state where an error has occurred. Further reading is impossible.
Locking via getReader()
To read data from a ReadableStream, you obtain a reader using getReader(). Once a reader is acquired, the stream's locked property becomes true, blocking access from other consumers. Note that the "lock" is a concept distinct from the stream's internal state (readable/closed/errored); it is strictly a property for mutual exclusion.
const stream = response.body;
const reader = stream.getReader(); // The stream is locked
// Attempting to acquire another reader results in an error
const reader2 = stream.getReader(); // TypeError: ReadableStream is locked
The reason for the lock is clear. Domenic Denicola, one of the architects of the WHATWG Streams Standard, explained it in GitHub Issue #241:
"once a stream is being read by C++, JS cannot touch it, both for performance and implementation-complexity reasons."
It is a design to prevent conflicts between C++ native code and JavaScript, and to avoid "foot-gun" scenarios where unexpected .read() calls occur while piping.
The Reality of response.json()
Methods we commonly use like response.json() and response.text() are convenience methods that read the stream until the end internally. Behind the scenes, the process of "acquire reader → read all chunks → close stream" is executed at once.
In other words, by the time you call response.json(), the stream is in a closed state. The data has already been converted into a JavaScript object in memory, and nothing remains in the stream itself.
Figure: ReadableStream state transitions. "Locked" is a property, not a state; the internal states are only readable, closed, and errored.
Using an analogy makes this easier to understand. A stream is like water in a plastic bottle. Once you pour it into a glass, you cannot put it back into the bottle. response.json() is equivalent to the action of "pouring all the water from the bottle into a glass."
By now, the fact that "it disappears once read" is clear. But why is it designed this way? The answer lies in the OS kernel.
Kernel Socket Buffers — The Origin of "Read Once"
The origin of the design where a ReadableStream can only be read once lies in the TCP socket buffer of the OS kernel. By looking at how Web browsers and servers receive data from the network, we can see that "disappearing once read" is a physical necessity.
TCP Socket Receive Buffer
When data arrives from the network, the kernel stores it in a memory area called the socket receive buffer. This buffer resides in kernel space and cannot be accessed directly by applications.
Crucially, the size of this buffer is finite. Let's look at the default values for Linux's net.ipv4.tcp_rmem:
| Parameter | Default Value | Description |
|---|---|---|
| min | 4,096 bytes (4KB) | Guaranteed minimum even under memory pressure |
| default | 87,380 bytes (~85KB) | Initial receive buffer size for new sockets |
| max | 6,291,456 bytes (~6MB) | Maximum size during auto-tuning |
Several tens of KB to hundreds of KB. This is the realistic size of the buffer allocated to a single TCP connection.
The read() System Call — "Copy and Delete"
To receive data, an application uses the read() system call. Understanding exactly what happens at this moment is the core of this question.
- The application calls
read(fd, buf, size). - The kernel copies data from the receive buffer (
sk_buffchain) into user space. - After the copy is complete, the kernel deletes the data from the buffer.
- The receive window expands by the amount of freed buffer space, and this is notified to the sender via ACK.
The description of the MSG_PEEK flag in the Linux recv(2) manual indirectly confirms this "deletion" behavior:
"This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue."
In other words, the default behavior of a regular read() / recv() call—which does not specify the MSG_PEEK flag—is to remove the data from the receive queue.

Figure: Kernel socket buffer data flow. read() copies and deletes from the buffer simultaneously.
Why delete it?
The answer is simple: because the buffer is finite.
Imagine a situation where new data is arriving one after another into an 85KB buffer. If you kept the read data in the buffer, there would be no space left to store the incoming data.
When the buffer fills up, TCP flow control is triggered. The receiver sets the TCP window size to zero and notifies the sender. When the sender receives this zero-window notification, it stops sending completely. This is how backpressure works.
At the kernel level, "delete once read" is not just an option; it is the only rational design for maintaining communication within finite resources.
Why Is This Design Necessary — Memory Efficiency and Backpressure
There are three main benefits provided by the "disappears once read" design.
Memory Efficiency — Processing 1GB with 64KB
The biggest advantage of stream processing is that you do not need to load the entire data into memory.
Consider the case of downloading a 1GB file. An approach that loads the entire file into memory requires 1GB of memory. On the other hand, stream processing handles it chunk by chunk, so you only need memory equal to the buffer size (e.g., 64KB).
// Full read — The entire file is loaded into memory
const response = await fetch("https://example.com/large-file.bin");
const allData = await response.arrayBuffer(); // 1GB is loaded into memory
// Stream processing — Processed chunk by chunk
const response = await fetch("https://example.com/large-file.bin");
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read(); // Chunk of a few KB to tens of KB
if (done) break;
await processChunk(value); // After processing, the chunk is freed by GC
}

Figure: Comparison of memory usage between stream processing and full reading. The difference is about 16,000 times for the same 1GB file.
Backpressure — Consumers Control Producers
Another important property of streams is the ability to automatically control the producer's transmission speed to match the consumer's processing speed.
The WHATWG Streams Standard requirements document states:
"Without backpressure, slow writable streams in the chain will either cause memory usage to balloon as queued data grows without limit, or will cause data loss if the queue is capped."
If there were no backpressure, the sender would continue sending data beyond the consumer's processing speed, leading to runaway memory expansion and eventual OOM Kill. Flow control via TCP window size is the mechanism that prevents this automatically at the kernel level.
TTFB (Time to First Byte) — Start Processing with the First Chunk
The third benefit is the ability to start processing without waiting for all data to arrive.
For example, consider the case of streaming HTML delivery in Server-Side Rendering (SSR).
// Streaming SSR — Header portion reaches the browser first
const encoder = new TextEncoder();
app.get("/page", (c) => {
return c.body(
new ReadableStream({
async start(controller) {
controller.enqueue(encoder.encode("<html><head>...</head><body>"));
// ↑ The browser can start loading CSS here
const data = await fetchSlowDatabase(); // 2-second DB query
controller.enqueue(encoder.encode(`<main>${renderContent(data)}</main>`));
controller.enqueue(encoder.encode("</body></html>"));
controller.close();
},
})
);
});
In the approach that buffers everything, the user sees nothing during the 2 seconds required for the DB query. With streaming, the browser can begin loading CSS or fonts as soon as the <head> tag arrives. The perceived waiting time is significantly reduced because processing can start as soon as the first chunk arrives. This is the improvement in TTFB.
Why Did Web Standards Adopt Streams — The Design Philosophy of the Fetch API
How did the kernel-level "read-and-delete" design connect to Web standards? Here, let's look at the design philosophy of the WHATWG Streams Standard.
Mapping to Low-Level I/O Primitives
The goal of the specification is clearly stated at the beginning of the WHATWG Streams Standard:
"This specification provides APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives."
In other words, ReadableStream intentionally brings the same "read-and-delete" semantics found in the kernel's read() system call into the Web API.
The specification continues:
"Large swathes of the web platform are built on streaming data: that is, data that is created, processed, and consumed in an incremental fashion, without ever reading all of it into memory."
Trade-offs as a General-Purpose API
The Fetch API is designed to be general-purpose. It handles everything from a few KB of JSON responses to multi-GB file downloads with the same Response object.
This raises a natural question: "Isn't this over-engineered for standard JSON APIs?"
Frankly, yes. For a few KB of JSON response, the constraint that a stream can only be read once feels like overkill. However, for a general-purpose API, designing for the most constrained case (streaming massive files) is the correct choice.
The official FAQ for the WHATWG Streams Standard explains this design decision while acknowledging the trade-off:
"Your usual use case is a single consumer, with allowances for multi-consumer via teeing."
The design principle is that a single consumer is the standard use case, and if you need to read it multiple times, you should use tee().

Figure: Four-layer structure of the Fetch API. The constraint permeates from the kernel all the way to the application.
response.clone() — Paying the Cost Explicitly
The official workaround for the "only read once" constraint is response.clone().
const response = await fetch("https://api.example.com/data");
const clone = response.clone(); // Split (tee) the stream
const json = await response.json(); // Consume the original response
const text = await clone.text(); // Consume the clone
clone() internally uses ReadableStream.tee() to branch the stream into two. However, as MDN explicitly warns, this comes with a memory cost:
"If only one cloned branch is consumed, then the entire body will be buffered in memory."
If you only consume one branch, the entire body will be buffered for the other branch. Furthermore, tee() has a backpressure problem at the specification level (WHATWG Issue #1235), where data can potentially accumulate indefinitely on the slower branch.
In other words, clone() is a way to circumvent the "only read once" constraint, but it is a design that requires you to explicitly pay the memory cost. This is consistent with the stream's "disappears once read" principle. You can replicate it if necessary, but it's not free.
Importance in Edge Runtime
This design becomes especially critical in Edge Runtime environments like Cloudflare Workers or Vercel Edge Functions.
| Runtime | Memory Limit |
|---|---|
| Cloudflare Workers | 128MB |
| Vercel Edge Functions | 128MB |
| Deno Deploy | 512MB |
If you buffer a 100MB response within a 128MB memory limit, you only have 28MB left to run the entire runtime. Cloudflare's official documentation explicitly recommends "using TransformStream or node:stream to stream rather than buffering the entire payload into memory" precisely because of this constraint.
Middleware and Streams — "Not Touching Is Virtue"
Once you understand the characteristics of streams, you can see why web framework middleware is designed not to touch the response body.
Most Middleware Does Not Read the Body
Validating authentication headers, adding CORS headers, checking rate limits—these middleware handle headers and metadata, not the response body.
Passing the stream through as-is = zero memory copies. If you don't touch the body, the cost is zero. This is the design philosophy of "pay-as-you-go."
Contrast with Express's body-parser
Reflecting on the approach from the Express era makes the difference in design philosophy clear.
// Express: Read the whole thing into memory when the request arrives
app.use(express.json()); // body-parser buffers everything
app.post("/api", (req, res) => {
console.log(req.body); // Access the parsed object
});
// Hono: Body remains unconsumed until explicitly read
app.post("/api", async (c) => {
const body = await c.req.json(); // The stream is consumed at this point
return c.json({ received: body });
});
In the Express era (2010s), servers had ample memory, and JSON payloads were the norm. A design that buffered everything beforehand with body-parser was rational.
In modern Edge Runtime environments, memory constraints are tight, and support for streaming responses and large payloads is required. Hono (2022) and Next.js App Router (2023) adopted the Web standard ReadableStream precisely to address these environmental changes.
Revisiting the Opening Problem
Returning to the POS API problem from the beginning: if middleware reads the response stream, the body becomes empty for subsequent processing. This is not a bug; it is the stream working exactly as specified.
There are two correct ways to handle this.
Method 1: Use clone()
// Bad: Consumes the stream
const middleware = async (c, next) => {
await next();
const body = await c.res.json(); // Consumes the stream — body is gone for subsequent steps
cache.set(key, body);
};
// Good: clone() before reading
const middleware = async (c, next) => {
await next();
const cloned = c.res.clone(); // ← Branches the stream
const body = await cloned.json();
cache.set(key, body);
};
Method 2: Save the data separately
// Good: Save data to Context inside the handler
app.post("/api/order", async (c) => {
const result = { orderId: "12345", status: "completed" };
c.set("responseBody", result); // ← Keep data separate from the stream
return c.json(result);
});
Method 2 is often more memory-efficient. clone() involves buffering the entire stream, whereas saving to a Context variable avoids re-reading the stream and allows you to use the original object directly.
Having focused on Hono, let's see how other frameworks handle the body.
Framework Comparison — Design Philosophy Seen Through Body Handling
Comparing how major frameworks handle HTTP bodies provides an overview of where stream design fits in.
| Framework | Body Handling | Characteristics |
|---|---|---|
| Hono | Web Standard ReadableStream | Unread until explicit consumption. Edge Runtime ready |
| Express | Pre-buffering via body-parser | Immediate access as an object via req.body
|
| NestJS | Express-based + Pipes/Interceptors | Prioritizes type safety/structure |
| FastAPI | Pre-parse + validation via Pydantic | Built-in type safety and validation |
| Next.js App Router | Web Standard Request/Response | Explicitly consumed via request.json()
|
Broadly speaking, there are two lineages: "Convenience-first (pre-buffering)" and "Efficiency-first (stream pass-through)".
[Traditional: Buffering-first]
Express (2010~) → Full load via body-parser
NestJS (2017~) → Express-based + type-safety layer
FastAPI (2018~) → Full load + Pydantic validation
[Modern: Stream-first]
Hono (2022~) → Web Standard ReadableStream
Next.js App Router (2023~) → Web Standard Request/Response
The background to Hono adopting Web standard streams is that it targets Edge Runtimes like Cloudflare Workers and Deno Deploy. Under a 128MB memory limit, stream pass-through design is not just an optimization; it's a mandatory requirement.
Neither is "correct"; it is a trade-off based on the operating environment and use case. If you are building a simple JSON API on a Node.js server with plenty of memory, the pre-buffering of Express/NestJS will offer a better developer experience. If you are handling Edge Runtimes or large payloads, a design based on streams is more appropriate.
Summary — "Only Being Able to Read Once" Is a Rational Design
ReadableStream can only be read once because it is a design faithful to kernel-level I/O primitives. Let's recap the key points of each layer:
-
Kernel: The socket receive buffer is finite (dozens to hundreds of KB).
read()copies data and deletes it from the buffer simultaneously. To keep communication going within finite resources, read data must be freed. - Network: When the buffer fills, it sets the TCP window size to 0 to stop transmission (backpressure). This enables flow control matched to the consumer's processing speed.
- Web Standard: The WHATWG Streams Standard sets "efficient mapping to low-level I/O primitives" as a design principle. It is a general-purpose design tailored to the most constrained cases (streaming massive files).
- Framework: If you don't touch the stream, there is zero memory copy. The "pay-as-you-go" design is especially important in Edge Runtime environments.
Things to Keep in Mind for Practice
When handling streams, it is important to be conscious of "consumption." Once you call response.json() or response.text(), the stream is consumed. If you need to read it multiple times, either explicitly pay the cost with clone() or save the parsed data to a variable. Which cost you choose depends on the use case.
"Only being able to read once" is not a restriction; it is a consistent design decision from the kernel all the way to Web standards.
Discussion