iTranslated by AI
DB Design and Performance Tuning for Small-Scale Services: Shared DB, N+1, EXPLAIN ANALYZE, and Indexing
Introduction
In this article, I will summarize my notes on DB design and performance optimization for a small-scale service.
For this service, rather than splitting the DB like a microservices architecture from the start, we opted for a configuration where we share the DB while separating responsibilities by service.
Based on that, I addressed the following challenges that arose during implementation:
- Whether to share a DB in a service-based architecture
- The N+1 problem
- Batch retrieval using IN clauses
- Problems with queries using VIEWs and connection retention
- Checking execution plans using EXPLAIN / EXPLAIN ANALYZE
- How to interpret Seq Scan / Index Scan
- Making decisions on adding indexes
Service-based architecture and shared DB
This service has few users, and the service boundaries were still subject to change.
Therefore, we did not adopt a configuration that splits the DB from the beginning.
The approach we adopted is as follows:
Separate services
Share the database
This is a concept similar to a so-called service-based architecture.
Reasons for using a shared DB
The reasons for choosing a shared DB are as follows:
- JOINs can be used
- Transactions are easier to handle
- Data consistency is easier to maintain
- Backups and monitoring targets can be centralized
- It is easier to achieve development speed at a small scale
- You can avoid the operational costs of DB splitting
When you split a DB from the beginning in a small service, complexity can sometimes outweigh independence.
Therefore, we decided it was more realistic to start with a shared DB and consider splitting it after the service boundaries and load characteristics become clearer.
Risks of a shared DB
Of course, there are also risks to a shared DB:
- The independence between services becomes weaker
- The scope of impact of schema changes becomes wider
- Tight coupling occurs if multiple services start referencing tables directly
- There will be migration costs when splitting the DB in the future
Therefore, even when using a shared DB, we were conscious of separating responsibilities on the application side.
For example, instead of cross-referencing tables indiscriminately, we clarify the responsibilities of the data handled by each service as much as possible.
N+1 problem occurred
The first issue encountered with DB access was the N+1 problem.
For example, after fetching parent data, child data was being fetched one by one within a loop.
SELECT *
FROM parent_table;
Then, child data is fetched for the number of parent data records.
SELECT * FROM child_table WHERE parent_id = 1;
SELECT * FROM child_table WHERE parent_id = 2;
SELECT * FROM child_table WHERE parent_id = 3;
This state, where 1 SQL statement for parent data + N SQL statements for child data are issued, is the N+1 problem.
While this is not noticeable when the number of records is small, as data increases, the number of round trips to the DB grows, degrading response times.
Batch retrieval using IN clauses
In this optimization, I moved away from individual SELECTs within loops and changed the implementation to batch retrieve by passing parent IDs collectively to an IN clause.
SELECT *
FROM child_table
WHERE parent_id IN (1, 2, 3, 4, 5);
The implementation flow is as follows:
1. Fetch parent data
2. Put parent IDs into an array
3. Batch fetch child data using an IN clause
4. Map data on the application side
5. Associate child data with parent data
The concept is as follows:
const parents = await parentRepository.findAll();
const parentIds = parents.map((parent) => parent.id);
const children = await childRepository.findByParentIds(parentIds);
const childrenByParentId = new Map();
for (const child of children) {
const list = childrenByParentId.get(child.parentId) ?? [];
list.push(child);
childrenByParentId.set(child.parentId, list);
}
const result = parents.map((parent) => ({
...parent,
children: childrenByParentId.get(parent.id) ?? [],
}));
The SQL looks like this:
SELECT *
FROM child_table
WHERE parent_id IN (:parent_ids);
When passing an array as a parameter in PostgreSQL, you can also use ANY.
SELECT *
FROM child_table
WHERE parent_id = ANY($1);
However, the key point of this improvement was not the subtle syntax, but rather moving from querying N times to fetching all at once using an IN clause.
Fetching with JOIN vs. IN clause
For N+1 countermeasures, there is also the method of batch retrieval using JOIN.
SELECT
p.id AS parent_id,
p.name AS parent_name,
c.id AS child_id,
c.name AS child_name
FROM parent_table p
LEFT JOIN child_table c
ON c.parent_id = p.id
WHERE p.id IN (1, 2, 3, 4, 5);
When fetching with JOIN, it is easier to aggregate parent-child relationships with just SQL.
On the other hand, depending on the number of parent and child data, the result rows can become too numerous, or the formatting on the application side can become complex.
This time, I judged it easier to handle by first fetching the parent data, then using those parent IDs to batch fetch child data with an IN clause, and finally mapping it on the application side.
VIEW and connection holding issues
I also struggled with processing that used VIEWs during DB design.
What to note here is that a standard PostgreSQL VIEW itself does not occupy a connection.
A PostgreSQL VIEW is like a saved SELECT statement, and the defined query is executed at the time of reference.
Reference:
What appeared as a problem this time was not the VIEW itself, but the possibility that DB connections were held for a long time due to heavy queries referencing the VIEW, long-running transactions, or failures to release connections on the application side.
Therefore, I reviewed the following:
- Are the SQL statements referencing the VIEW becoming heavy?
- Is the WHERE clause working appropriately?
- Is an unnecessarily large amount of data being read?
- Are transactions left open for a long time?
- Are connections being reliably released?
It is also important to reliably return the connection after processing on the application side.
For example, when using a connection obtained from a pool, ensure it is released not only upon successful completion but also in case of an exception.
const client = await pool.connect();
try {
const result = await client.query('SELECT * FROM sample_view WHERE id = $1', [id]);
return result.rows;
} finally {
client.release();
}
Even when using an ORM or a query builder, it is advisable to check for long-running transactions or connection leaks.
Checking execution plans with EXPLAIN
When SQL is slow, do not simply add indices based on intuition; first, look at the execution plan.
In PostgreSQL, you can use EXPLAIN to check how the SQL is planned to be executed.
Reference:
EXPLAIN
SELECT *
FROM child_table
WHERE parent_id = 123;
If you want to actually execute it and see the measured values, use EXPLAIN ANALYZE.
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM child_table
WHERE parent_id = 123;
Adding ANALYZE will actually execute the SQL.
Therefore, it is easy to use for SELECT, but caution is required for DELETE and UPDATE.
Be careful with EXPLAIN ANALYZE for DELETE / UPDATE
As shown below, adding EXPLAIN ANALYZE to a DELETE statement will actually execute the DELETE.
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM child_table
WHERE parent_id = 123;
In a production environment, it is basically safer to stick with EXPLAIN.
EXPLAIN
DELETE FROM child_table
WHERE parent_id = 123;
If you want to measure actual values in a staging environment, wrap it in a transaction and then ROLLBACK.
BEGIN;
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM child_table
WHERE parent_id = 123;
ROLLBACK;
Observing Seq Scan and Index Scan
What I check particularly closely in execution plans are the following:
- Seq Scan
- Index Scan
- Bitmap Index Scan
- Bitmap Heap Scan
Seq Scan
Seq Scan is a method of reading the entire table sequentially.
Seq Scan on child_table
Filter: (parent_id = 123)
This appears when there is no index on the column used for the search condition, the table is small, or when reading a majority of the rows.
Seq Scan is not always bad.
However, if you are performing a Seq Scan on a table with a large number of rows when you only want to retrieve a small number of them, consider adding an index.
Index Scan
Index Scan is a method of finding target rows using an index.
Index Scan using idx_child_table_parent_id on child_table
Index Cond: (parent_id = 123)
If there is an index on a column frequently used in WHERE clauses or JOIN conditions, an Index Scan is more likely to be used.
Bitmap Index Scan / Bitmap Heap Scan
When retrieving multiple rows in batches, a Bitmap Index Scan / Bitmap Heap Scan may appear.
Bitmap Heap Scan on child_table
Recheck Cond: (parent_id = 123)
-> Bitmap Index Scan on idx_child_table_parent_id
Index Cond: (parent_id = 123)
This is essentially gathering candidate rows using the index before reading the actual table data.
Adding an index
After checking the execution plan, I found places where there were no indices on columns frequently used in WHERE clauses or JOIN conditions.
For example, if you frequently search by child_table.parent_id, add an index as follows:
CREATE INDEX CONCURRENTLY idx_child_table_parent_id
ON child_table(parent_id);
If you often search using composite conditions, consider a composite index.
CREATE INDEX CONCURRENTLY idx_child_table_parent_id_created_at
ON child_table(parent_id, created_at DESC);
After adding an index, update the statistics.
ANALYZE child_table;
Then, check the execution plan again.
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM child_table
WHERE parent_id = 123
ORDER BY created_at DESC;
The points I checked were:
- Did it change from Seq Scan to Index Scan or Bitmap Index Scan?
- Did the
actual timedecrease? - Did the amount of read buffers decrease?
- Is there a large discrepancy between estimated and actual rows?
- Did
Rows Removed by Filterdecrease?
Summary
In this DB improvement project, I chose not to over-segment the database.
For small-scale services, a configuration where responsibilities are separated by service while sharing a single database can be more practical.
In addition, I addressed the N+1 problem by stopping individual SELECTs within loops and instead fetching everything at once by passing parent IDs into an IN clause.
Furthermore, I verified execution plans with EXPLAIN / EXPLAIN ANALYZE and added indices to necessary columns while monitoring Seq Scans and Index Scans.
Regarding issues with VIEWs and connection holding, I identified and confirmed that the problem was not the VIEW itself, but rather the possibility of heavy queries, long-running transactions, and connection release leaks.
I felt that for DB optimization, it is more important to first check the execution plan and identify the bottleneck than to blindly apply indices.
Discussion