iTranslated by AI
Forcefully Eliminating Client N+1 with GraphQL Batching (Case Study: Howtelevision Inc.)
I analyzed and improved the performance of the question box service Mond for HowTelevision, Inc.
Specifically, this involves measuring the breakdown of LCP, proposing solutions, and optimizing GraphQL requests—the biggest issue.
At this point, not all issues have been fixed, but from a development standpoint, all breakdowns are recognized, and once verification is complete, they can be released in stages.
Below, honorifics are omitted.
Consultation Content
Improve the Lighthouse score of mond.how
Key Technology Stack
- Next.js - Page Router
- Hasura CE - GraphQL Server (Self-hosted version of Hasura)
Measurement and Issues
Recently, there is a dashboard provided by Chrome where you can see the trend of Lighthouse scores.
Looking at the recent scores for Mond here.

I was looking at https://mond.how/kumagi as a representative example. It has many answers.
While cruxvis showed this kind of graph, it seems server load was the main factor this time, as the score fluctuated significantly depending on the time of day.
As far as I measured locally, the score was around 60 points during the day and about 30 points at night. Responses are slower at night, likely due to higher load.
From here, I'll look at the breakdown using DevTools.
TBT, or the blocking time due to JS evaluation, is long. There's a simple issue with the large build size. This is a front-end problem that occurs regardless of the time of day.

And there are too many GraphQL requests.

There are simply too many. There is a long wait until the request queue is cleared, resulting in a virtual waterfall.
Creating an Ideal State with Mocks
I identified heavy data while building from the source code locally and connecting to the production environment.

- Mocked cover images for each user
- Disabled the comment form
- Stopped loading comments other than for LCP
In this state, the score was 82, nearly perfect except for TBT.
The remaining issues in this state are the JS bundle size and evaluation time. Therefore, I will check the bundle size first.
Bundle Size Analysis
First, I ran an analysis using @next/bundle-analyzer.

Since it uses Next Page Router, _app is the primary entry point.
While it's clear that katex (a math renderer) is duplicated, the real problem is the high proportion of the build size taken up by the tiptap and prosemirror editors.
In terms of UI, this corresponds to the question editor.

There are two issues in this area:
- With this UI, the editor must always be loaded.
- Even on screens that don't load the editor, the editor assets are constantly being loaded.
For the former, I mentioned that a specification decision is needed and that some form of lazy loading is necessary. For example, I proposed "displaying the editor only after clicking the 'Ask' button" or "displaying a mock of the editor first and replacing it with a skeleton that loads on hover or click."
As for the latter, it is a purely technical issue. I confirmed that isolating the editor to be lazy-loaded using Dynamic Import or similar methods would make the problem less likely to occur.
However, even for display purposes outside of editing, raw data was being passed around instead of compiled data, and tiptap was being used to render it. Unless this is corrected, the issue of the massive tiptap bundle won't be resolved.
Since front-end fixes require too much verification work, I created an issue for the Mond development team to handle, and I decided to focus primarily on the GraphQL issues.
The Problem of Too Many GraphQL Requests
The number of requests is too high. This is the main point.

In Mond's implementation, code for react-query is generated using GraphQL Codegen, and this is used from the client code.
const query = useUserDetailsQuery(...)
const key = useUserDetailsQuery.getKey({ id: theUsersId })
User answers to questions are in an infinite scroll list view, and each list item issues about 5 to 10 additional requests for its own detail resolution. I see, that's heavy.
List retrieval and item details are in an N+1 relationship, but since it's infinite scroll, I can see it's architecturally a bit difficult to batch them.
While there is room to optimize detail retrieval, requests are flying sporadically, and because the number of requests is too high, there are many waiting in the request queue.
Trial Implementation of GraphQL Client Batching
Architecturally, the problem is that data is being resolved sporadically on the client, so the granularity of data resolution needs to be increased everywhere. However, the internal team already knows this without being told by an outsider. Still, we shared the common feeling that it's indeed heavy after quantifying it.
By the way, GraphQL is originally supposed to be a technology to solve client-side N+1 problems and should be able to combine client requests.
Since Mond uses graphql-request, I checked if it has GraphQL request batching, and it did.
This allows you to send GraphQL queries together as follows:
await batchRequests('https://foo.bar/graphql', [
{
query: `
{
query {
users
}
}`
},
{
query: `
{
query {
users
}
}`
}])
However, to use this, the GraphQL endpoint side must support it.
(When I hit the GitHub GraphQL Endpoint to verify GraphQL behavior, it was not supported.)
I checked if Hasura supports it and found a relevant issue, and it is supported.
I wanted to try this locally. The latest version of graphql-request is trying to redesign and rebrand under the name graffle, and (currently) support for batch requests was missing there.
So, I decided to manually create a batching client based on the slightly older graphql-request that Mond was using.
The following implementation modifies the code generated by graphql-codegen to repack and send all requests occurring within 500ms in the same request.
// src/graphql/react-query.ts
// Original implementation
function fetcher<TData, TVariables extends { [key: string]: any }>(
client: GraphQLClient, query: string, variables?: TVariables, requestHeaders?: RequestInit['headers']
): () => Promise<TData> {
return async (): Promise<TData> => client.request({
document: query,
variables,
requestHeaders
});
}
// Batched implementation
function fetcher<TData, TVariables extends { [key: string]: any }>(
client: GraphQLClient, query: string, variables?: TVariables, requestHeaders?: RequestInit['headers']
): () => Promise<TData> {
return batchedFetcher(client, query, variables, requestHeaders);
};
let _queue: Array<{
query: string;
variables?: any;
requestHeaders?: RequestInit['headers'];
resolve: (value: any) => void;
reject: (reason?: any) => void;
}> = [];
let _isFetching = false;
let _timeoutId: NodeJS.Timeout | null = null;
const DEBOUNCE_TIME = 500; // ms
function batchedFetcher<TData, TVariables extends { [key: string]: any }>(
client: GraphQLClient,
query: string,
variables?: TVariables,
requestHeaders?: RequestInit['headers'],
) {
return (): Promise<TData> => {
return new Promise((resolve, reject) => {
_queue.push({ query, variables, requestHeaders, resolve, reject });
if (_timeoutId) clearTimeout(_timeoutId);
_timeoutId = setTimeout(() => {
if (!_isFetching) _processBatch(client);
}, DEBOUNCE_TIME);
});
};
}
async function _processBatch(client: GraphQLClient) {
const batch = _queue.slice();
_queue = [];
const batchSize = batch.length;
return client
.batchRequests({
documents: batch.map(({ query, variables }) => ({
document: query,
variables,
})),
requestHeaders: batch[batch.length - 1].requestHeaders
})
.then((res) => {
batch.forEach(({ resolve }, index) => {
resolve(res[index].data);
});
})
.catch((error_) => {
const response = error_.response;
if (response) {
for (let i = 0; i < batchSize; i++) {
const res = response[i];
if (res.data) {
batch[i].resolve(res.data);
} else {
batch[i].reject(res);
}
}
}
})
.finally(() => {
_isFetching = false;
if (_queue.length > 0) {
batch.length = 0;
_processBatch(client);
}
});
}
(This is verification code; in reality, I am doing things like creating an allow-list to branch only some queries, among other things.)
When I tried running the batched client on Mond, I confirmed a qualitative improvement in speed. Specifically, parts that were issuing 150 requests were reduced to 18 requests. However, as a result, each individual request is heavier. This is natural since they are all combined.
Batching is a radical change that alters the load and performance characteristics of the application, so it needed to be introduced while seriously verifying its operation. For example, since it extracts and uses authentication information from the last request, if there were places where session information was being dynamically rewritten between requests, this wouldn't be able to handle it. (I don't think there are any, though...)
As my one-month project, it ended with creating a PR to generate client code for this using graphql-codegen.
Results of the Engagement
I was able to identify heavy areas—JS TBT and request-dependent LCP/CLS—break down the Lighthouse scores, and create PoCs for their respective improvements.
However, I couldn't guarantee the operational verification, so the project ended with me handing off the remaining work. In a position where one joins on a spot basis to improve performance, it's difficult to delve into the deep parts of the domain.
On the other hand, I feel that precisely because I lacked domain knowledge, I was able to face the DevTools metrics calmly. When you read the code, you might sometimes fail by looking at numbers that aren't actually heavy due to assumptions. Since my measurement methods have become more organized within myself, I'd like to formalize this knowledge eventually.
Summary
- Monitor bundle sizes to ensure heavyweight assets do not enter major workloads.
- Keep collocation in mind during the design phase to avoid client-side N+1.
- Although GraphQL batch requests were introduced as a stopgap measure, they also seem likely to result in qualitative improvements.
Discussion