iTranslated by AI
Week 04: Designing Quality Gates to Prevent False Completions in Parallel AI Development

Introduction
I utilized AI agents in parallel to implement 40 screens and 37 server processes in just 4 days. The build passed, the screens displayed, and the code passed four rounds of code reviews.
When I touched the actual application, however, the basic functional flows were broken.
This article structurally analyzes "why I mistakenly perceived it as complete" and presents a quality gate design to prevent recurrence. This is written not as a letter of apology, but as an operational design document for the next project.
1. Defining the Problem: "Implementation Completion" vs. "Quality Completion"
I will clarify three distinctions that were confused during this development.
Completion of Screen Count ≠ Completion of Operational Flow
All 40 out of 40 screens were implemented. Progress was 100%. However, the flow of "posting two photos and viewing them via swipe" did not work. The existence of a screen and the functionality of that screen are two different things.
Existence of Code ≠ Establishment of UX
Although the post-editing feature was in the specifications, both the screen and the backend processing were unimplemented. The delete feature existed only on the My Page and could not be accessed from the Home or Detail screens. Code works where it exists, but the code was missing where it should have been.
Passing Static Review ≠ Confirmed Operational Functionality
Four rounds of code reviews confirmed "separation of concerns," "batch limits," and "null safety." However, "whether a user can post two photos and swipe through them" was never checked even once. Static analysis guarantees the correctness of the structure, but it does not guarantee the correctness of the experience.
This confusion of these three points led to the misconception that it was "complete."
2. Why Did the Quality Collapse?
The cause lies not in the performance of the AI, but in the design of the development process. I will organize four structural issues.
2-1. Structural Issues of Layer-by-Layer Development
Development proceeded in the following order:
Phase 1: Implement all 40 screens
Phase 2: Implement all 37 server processes
Phase 3: Review and quality check
In this configuration, the "connections" between screens remain unverified. The post creation screen saves photos as an array. The post display card references only the first element of the array. Each is correct in isolation, but from the user's perspective, "only one photo is visible."
Building by layer creates a structural disconnection between the saving side and the display side.
2-2. Insufficient Verification of Cross-Feature Connectivity
I had three AI agents implement features in parallel. When one agent was in charge of the "saving side" and another was in charge of the "display side," nobody checked if their specifications matched.
Parallelization increases the number of connection points. As the number of connection points increases, so does the validation cost. I overlooked this structural reality.
2-3. Failure to Execute Test Scenarios
I created 200 test scenarios, but the execution rate was 8.4%. I failed to distinguish between a state where "tests exist" and a state where "tests are completed." The mindset of "testing everything later" was effectively synonymous with "dealing with bugs only after they are found in large quantities."
2-4. Bias in AI Review Perspectives
The four review rounds were all based on static analysis (separation of concerns, scalability, security, idempotency). While these are correct perspectives, they did not include operation-based verification such as "does it work when a user touches it?"
The number of reviews is not a metric for quality. If the perspectives do not cover the functionality, no amount of iterations will detect what cannot be seen.
3. Quality Gate Design for Preventing Recurrence
This is the core of the issue. I will design quality gates to be introduced in the next project.
3-1. Redesigning the Definition of Done
"Code exists" is not considered finished. A task is only considered "Done" when all of the following are met:
□ Successful flow pass: Major operations must function from start to finish
□ Inter-screen connection check: Saved data must be correctly visible in all display locations
□ Full screen check for identical operations: Operations (delete, edit, etc.) must work on all relevant screens
□ UI state transition check: Loading, error, and empty states must be displayed correctly
□ Data consistency check: Database state after an operation must be as expected
This checklist is applied per feature. Rather than after creating all screens, verification is performed every time a single feature is completed.
3-2. Transitioning to Vertical Feature Development
Instead of layer-based development (Screens → Server → Review), development is cut vertically by feature unit.
❌ Layer-based (This time):
Phase 1: Implement all screens
Phase 2: Implement all server processes
Phase 3: Overall review
→ Connection points are not verified
✅ Vertical feature slicing (Next time):
Feature 1: Post CRUD
→ Create + Edit + Delete + Display + Server process + Operation check
→ Done: The "Post -> Display -> Edit -> Delete" flow must work across all screens
Feature 2: Collection CRUD
→ Each operation flow is treated as a single unit
Feature 3: User system
→ Profile + Follow + Block + Notification + Operation check
At the completion of Feature 1, one can notice that "only one photo is visible." The scale of rework is fundamentally different than noticing it after finishing all 40 screens.
3-3. Rules for AI Parallel Development
Parallel development itself is effective. The problem was the lack of management for connection points. I will operate under the following rules.
Rule 1: Generate a connection point list before parallel implementation
Step 0: "I will implement the following features in parallel. Identify all data exchanges between features beforehand."
→ A connection point list is output
Step 1: Attach the connection point list to each agent's instructions for parallel implementation
Step 2: Verify the connection point list using a different agent after implementation
Agents execute Step 0 and Step 2. The only thing humans add is "time to design the instructions."
Rule 2: Include connection checks in agent instructions
❌ "Implement the post creation screen"
✅ "Implement the post creation screen. Completion criteria:
- When multiple photos are saved, ensure all can be seen via swipe on the display card side
- Confirm consistency in data exchange by also reading the implementation of the display card"
Explicitly instruct them to "read the adjacent code and confirm connectivity." Humans do this implicitly, but AI will not do it unless told.
Rule 3: Run a verification agent separately from the implementation agent
❌ "Review this code" → Static analysis only
✅ "Trace the code for the following user operation flows to confirm if they hold true:
1. Post with 2 photos → Can they be swiped on the display side?
2. Delete a post → Does the menu exist on all relevant screens?
3. Set to private → Does it disappear from the feed?"
This is a desk-trace of the operation flow. Even without touching the actual device, the agent can trace the code and notice that "a callback was not passed to this screen." Most of the bugs found this time should have been detected with this method.
Rule 4: Explicitly state "connection points" in specifications
❌ "Posts can be deleted" → Deletion is only implemented on 1 screen
✅ "Posts can be deleted (in 3 places: Home / My Page / Detail screen)"
❌ "Multiple photos can be posted" → Multiple saved, only 1 displayed
✅ "Multiple photos can be posted, and all can be viewed via swipe during display"
AI does not pick up on implicit understandings. Only by writing "where" and "how it connects" will the saving and display features actually link up.
3-4. Changing How Progress Is Measured
❌ 40 out of 40 screens implemented = 100%
✅ Operation confirmed ○ out of 15 major flows = ○%
Screen counts or lines of code are not progress metrics. Measure progress by "how many flows a user can complete."
4. Future Development Rules
I declare the rules to be adopted in the next project.
- Manually verify the operation flow after implementing each feature
- Always generate a connection point list before parallel implementation
- Always run static reviews and operation flow traces as a set
- Execute test scenarios at the same time they are created. "Later" is prohibited
- Measure progress by "Number of confirmed flows / Total number of flows"
- Set a Feature Freeze date and stop new implementation to focus on testing
- Set a UX confirmation date before TestFlight distribution and manually run all major flows
5. Summary
AI parallel development is fast. I was able to implement 40 screens in 4 days with 3 agents in parallel. I will continue to utilize this speed.
The problem was not the speed, but that the definition of completion remained at "the existence of code." Increasing speed without quality gates results in "creating things that break quickly."
Designing quality gates is not difficult. Generating connection point lists, desk-tracing operation flows, and vertical feature slicing—all of these can be executed by agents. The only thing humans need to increase is the "precision of instructions," not the "hands for implementation."
The most dangerous thing in AI development is failing to notice a state where the project looks finished but is not. Quality gates are a mechanism to structurally prevent that illusion.
Discussion