iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📌

Why Batch Execution is Prioritized Over Automated Testing and How to Improve

に公開

Introduction

Imagine a scene during a pre-release rotation where a conversation like this occurs:
"The CI is red, but it worked fine when I ran it in a production-equivalent environment."
When this statement is repeated, trust in tests begins to decline, leading to a state where people assume the system works correctly in production even when the CI or automated tests are failing (red).

In practice, there are moments where "actually running the batch and seeing no issues" is valued more than "the tests passing."
This is often not because batch execution is inherently superior, but because trust in the test code has not been sufficiently cultivated.

Especially in projects that have been running for a long time, broken tests left as-is and tests that have diverged from specifications tend to coexist.
In such a state, it is natural for manual execution results to seem more reassuring than test results.

This article organizes why this situation occurs and summarizes a practical approach to improving it without overextending.

Common Situations in Practice

Be careful if the following types of conversations increase during rotations or pre-release checks:

  • Tests are passing, but let's run the batch just in case.
  • This test has been broken for a while, so let's ignore it for now.
  • CI is red, but it doesn't seem to have an impact, so let's proceed.
  • If it runs without issues with production-equivalent data, call it OK.

This way of operating works in the short term, but in the long term, quality judgment becomes dependent on individuals.

Why Batch Execution is More Trusted

The reason is simple: because the results are visible.

  • You can verify with input close to real data.
  • The execution results, including dependencies, come out as they are.
  • Success and failure are easy to share within the team.

On the other hand, this is a matter of "peace of mind" rather than a "reproducible guarantee."
Execution verification is powerful, but it can only guarantee the cases that were actually checked.

Typical Patterns of Declining Test Reliability

There are typically several patterns behind the increase in broken tests:

  • Tests are not updated after specification changes, leaving only the expected values outdated.
  • Mocks drift away from the implementation, failing to catch differences with the real environment.
  • Test data is too far from reality, making it impossible to reproduce production failures.
  • Tests are too slow and are skipped on a daily basis.
  • Flaky tests are left unaddressed, causing failures (red) to become noise.

In this state, a test failure turns from a "signal to investigate" into "routine noise."

Tests Depending on Internal DB State Cause Massive Failures

In projects with many failures, tests often depend on an implicit DB state. For example:

  • Skipping insertion logic and verifying based on the assumption that data already exists.
  • A test depending on data created by a previous test.
  • Tests passing on a developer's local machine but failing in a clean CI environment.
  • Depending on execution order and failing when run in isolation.

In these cases, tests fail because the "precondition of the initial state" is broken rather than the test code itself. As a result, it becomes difficult to isolate the cause of failure, leading to a decline in overall trust in the tests.

The solution is simple: return to a structure where each test prepares the necessary data itself.

  • Explicitly create fixtures for each test.
  • Initialize the DB per test (e.g., transaction rollbacks).
  • Create precondition data every time so as not to depend on execution order.
  • Prohibit the omission of "assuming data exists" during code reviews.

What Happens When Broken Test Code Increases

When the reliability of test code declines, development decisions falter.

  • Test failures are treated as "noise."
  • Disabling tests is chosen over investigating the cause of failure.
  • Manual verification checklists increase instead of adding tests.
  • The same verification tasks are repeated with every change.

In this state, tests become a ritual rather than quality assurance.

Separating the Roles of Automated Tests and Batch Execution

There is no need to think of them as opposites.
Dividing their roles stabilizes operations.

  • Automated tests: Continuously guarantee specification rules.
  • Batch execution verification: Perform final verification of environment differences and operational conditions.

Batch execution isn't bad; it is a valuable verification method for the field. The important thing is to keep batch execution as the "final insurance." It doesn't scale if manual verification becomes the main focus.

When Automated Tests are Ready, QA and Production-Equivalent Verification Can Focus on Differences

When you can continuously guarantee behavior at the specification level with automated tests, the need to manually verify everything every time in QA or production-equivalent environments decreases.
The primary target for verification can be narrowed down to perspectives related to changes.
It is realistic to limit tests in the QA environment to a few examples of NG patterns and a few examples of OK patterns, rather than covering every possible combination.

  • Connectivity for external system integration.
  • Impact of environment settings or infrastructure differences.
  • Performance and operational procedures with actual data volumes.

In short, the stronger the tests, the more manual verification functions as "final difference verification" rather than a "substitute for coverage."

Too Much Detailed Manual Verification Reduces Iterations and Tends to Lower Quality

Detailed verification in the execution environment is effective, but spending too much time on it every time reduces the actual number of tests that can be run in a day.
Consequently, the loop of fix-and-verify slows down, making early detection of problems difficult.

Particularly with batches, execution times are often fixed. If you rely on waiting for natural execution, the lead time per verification tends to be long.
This premise makes it even more important to finalize specifications early using automated tests that can be run immediately locally or in CI.

  • Each verification becomes heavy, lengthening the wait time for re-verification after a fix.
  • Even small changes have high verification costs, making it harder to release differences incrementally.
  • Feedback is delayed, and rework tends to become more significant.

To improve quality, it is important not just to increase verification but to make it fast-cycling. Moving detailed perspectives to automated tests and narrowing manual verification to differences or environment-specific perspectives stabilizes operations.

In Batch Processing, the Design of Test Data Preparation Determines Quality

Because batches are difficult to verify by direct interaction like screen operations, whether you can create input data and preconditions as intended directly links to test quality. It is important to make not just the expected results, but also the data conditions that produce those results, reproducible.

In Go development environments, rather than listing SQL directly, it is common to assemble data using factory functions or the builder pattern in combination with table-driven tests. This approach makes it easier to keep up with added cases or changed preconditions, and it makes the intent clearer to the reader.

  • It is easier to express differences per test case in code than writing SQL every time.
  • Common initial data can be factorized, reducing duplication.
  • Creating data for edge cases (error paths) is also easy by using a builder to modify only the necessary items.

The First Thing to Fix Is "Leaving Broken Tests Alone"

The first thing to do is not to increase the number of tests, but to stop leaving broken tests unattended.

  • Do not leave failing tests as "known reds."
  • Delete unnecessary tests and update necessary ones to match specifications.
  • Identify the cause of flaky tests and fix the reproduction conditions.
  • Directly link CI failures (red) to release decisions.

Simply eliminating the state where "it's normal for tests to fail" will significantly restore trust in the tests.

Should You Delete or Skip Broken Tests?

"Deleting for now" or "skipping for now" can improve the CI success rate in the short term. However, leaving them as-is tends to reduce the power of quality assurance.

The following criteria are practical for making decisions:

  • Permanently unnecessary tests: Delete them.
  • Necessary but currently broken tests: Skip them temporarily and fix them with a deadline.
  • Flaky tests: Separate them from the main CI and fix the cause while monitoring them in a dedicated job.

The important thing is to treat "skip" as a "temporary shelter" and not a "final destination." Skipped tests will continue to increase unless they are turned into tickets with an assigned person and a deadline.

Where to Automate and What to Keep Manual

To avoid hesitation in judgment, it is easier to organize by using the following criteria:

  • Tasks verified using the same steps every time: Automate
  • Tasks where inputs and expected values can be defined: Automate
  • Final verifications that are heavily dependent on the production environment: Keep manual
  • Low-frequency procedures such as initial deployment or disaster recovery: Keep manual

With this categorization, manual verification becomes a state of "remaining because it is necessary."

Realistic Improvement Steps

Rather than rebuilding everything at once, it is easier to proceed with improvements in the following order:

  • Select only three important batches.
  • First, identify tests for those batches that depend on DB preconditions.
  • Stabilize the happy paths and critical error paths for those batches with automated tests.
  • Move verification items that are currently checked manually every time into tests.
  • Keep only the perspectives that require manual execution as operational checks.

An Example of Before/After

Before
Run the batch every time before a release, manually check the result logs, and only verify successful completion. Since some tests remain broken, the CI's red status is ignored in practice.

After
Stabilize only the happy paths and critical error paths of the batch with automated tests, and reflect the CI's red status in the release decision. Execution in production-equivalent environments is narrowed down to differential perspectives such as external integrations and performance, ending the practice of exhaustive manual checks every time.

Minimum Metrics to Monitor

Setting numbers makes it easier to feel the improvement.

  • Success rate of essential tests (the set of tests used for release decisions).
  • Number of skipped tests and elapsed days.
  • Rerun rate of flaky tests.
  • Number of manual verification items before release.
  • Number of bugs found only through manual verification.

The goal is not to "increase the overall success rate visually," but to "increase the success rate that can be used for decision-making."

Situations Where Manual Execution is Still Necessary

There is no need to reduce manual execution to zero. For example, manual verification is highly valuable in the following cases:

  • Verification in a production-equivalent environment immediately before the initial release.
  • Connectivity checks for external system integrations.
  • Verification of performance and operational procedures when processing large volumes of data.

However, if you are performing the same check every time, it is a candidate for automation.

Summary

The reason why batch execution verification is prioritized over automated tests is not because batches are special, but because the reliability of the test code has declined.

Do not leave broken tests unattended, divide roles, and gradually automate manual verifications.

By continuing these three points, you will be able to support quality through systems rather than human intuition.

Discussion