iTranslated by AI
Automation Strategies to Never 'Forget' Android App Quality
Introduction
I am in charge of Android development for the live streaming app "Avvy," which was released this year. While proceeding with feature development at breakneck speed, I faced a certain problem.
"The UI of the streaming screen is slightly misaligned."
When a new feature was added, existing button positions shifted subtly. When a component was modified, the layout broke on another screen. Changes that weren't noticed during code review were coming to light.
What these problems have in common is that "humans forget and overlook things."
In the midst of rapid development, it is not realistic to perform quality checks manually every time. That is why it is crucial to create a "never forget" mechanism through automation.
In this article, I will introduce four quality automation strategies that I worked on in 2025 (and am currently working on).
Four Automation Strategies
| Strategy | Problem it solves |
|---|---|
| Screenshot Testing | Overlooking visual changes in UI |
| Auto Crash Fix | Postponing crash fixes |
| Library Update Management | Accumulation of technical debt |
| Performance Monitoring | Not noticing performance degradation |
1. Screenshot Testing - Never Missing UI Changes
Problem
UI changes are easy to overlook with the human eye.
- Padding changed by 1dp
- Button positions shifted slightly
- Unintended elements overlapping
Especially when developing with Jetpack Compose, changes to a component can affect unexpected places. In fact, a bug once occurred where the comment section hid the gift button.
Solution
I introduced the official Android Screenshot Testing to automate Visual Regression Testing.
@Test
fun streamingScreen_displaysCorrectly() {
composeTestRule.setContent {
StreamingScreen(/* ... */)
}
composeTestRule.onRoot().captureRobolectricImage()
}
Narrowing down the states to be tested
In streaming screens, there are many cases where the UI is displayed or hidden dynamically. Since testing all states would cause maintenance costs to explode, I adopted a policy of creating golden images with a Preview where all UI elements are displayed.

This allows us to detect important UI changes while avoiding frequent test updates.
Generating golden images on CI
What is important in Screenshot Testing is the stability of the tests. If you generate golden images in a local environment, differences in the environment (font rendering, OS version, etc.) can cause discrepancies on CI.
Therefore, I make sure to generate golden images on GitHub Actions as well. This ensures that the CI environment and the golden image environment match, achieving stable testing.
Key Points
- Focus on important screens: Limit tests to critical screens like Streaming and Viewer, rather than every single screen.
- Test all-UI-displayed states: For UI that changes dynamically, create golden images with everything visible.
- Generate golden images on CI: Prevent discrepancies due to environmental differences and ensure test stability.
2. Auto Crash Fix - AI Automatically Fixes Crashes
Problem
In a small, elite team, we often focus on feature development. Even if crash reports arrive in Firebase Crashlytics, it is difficult to prioritize them if the occurrence frequency is low.
However, crash fixes often only require minor adjustments, such as adding null checks or exception handling.
Therefore, by delegating these routine fixing tasks to AI, we created a system where developers can focus on feature development while still making progress on crash resolution.
Solution
We use Claude Code to automate everything from crash analysis to creating a fix PR.
Mechanism:
Every Monday 09:00 JST (GitHub Actions)
↓
Get top crashes from Firebase Crashlytics
↓
Analyzed by Claude Code
↓
Automatically generate fix PR
↓
Create and track issue in Linear
Workflow Overview:
name: Weekly Crashlytics Auto Fix
on:
schedule:
- cron: '0 0 * * 1' # Every Monday 00:00 UTC
workflow_dispatch:
inputs:
dry_run:
description: 'Dry run mode'
type: boolean
default: true
Prompt Design Strategies
In the prompts passed to the AI, we incorporate the following strategies:
1. Duplicate check for existing issues
To prevent duplicate issues from being created for the same crash, we search Linear by the Crashlytics Issue ID before creation.
2. Focus on minimal fixes
We explicitly state "Minimal changes only - Fix the crash, don't refactor surrounding code" in the prompt to prevent the AI from making excessive changes.
3. Mandatory build and test success
We require passing ./gradlew assembleDebug and ./gradlew testDebugUnitTest before creating a PR. If the build fails, the PR creation is skipped.
4. Leave uncertain fixes to humans
By including "If unsure about the fix, create the PR with a clear note asking for guidance," we ensure that the AI requests human review for fixes it is not confident about.
Key Points
- Dry run mode: Don't apply to production immediately.
- Human review mandatory: Always review PRs created by AI.
- Linear integration: Ensure traceability.
- Build/Test mandatory: Do not create PRs if CI does not pass.
3. Library Update Management - Preventing the Accumulation of Technical Debt
Problem
Library updates are often postponed with the mentality of "it's working now, so later." Without periodic inventory, they become outdated before you know it.
Current Inventory
First, I visualized the current status of major libraries.
| Library | Current | Latest | Priority |
|---|---|---|---|
| Kotlin | 2.1.0 | 2.2.x | 🔴 High |
| AGP | 8.7.3 | 8.13.x | 🔴 High |
| Compose BOM | 2025.05.01 | 2025.09.01+ | 🟡 Medium |
| Dagger Hilt | 2.54 | 2.57.2 | 🟡 Medium |
| Navigation Compose | 2.9.0 | 2.9.6 | 🟢 Low |
| Lifecycle | 2.8.7 | 2.10.0 | 🟢 Low |
| Coil | 3.0.0-rc01 | 3.x stable | 🟡 Medium |
| Core KTX | 1.15.0 | 1.17.0 | 🟢 Low |
Risk-Based Step-by-Step Updates
Priority is determined by the "volume of the update" and the "possibility of breaking changes."
Phase 1 - Low Risk (Small update volume)
- Navigation Compose 2.9.0 → 2.9.6
- Core KTX 1.15.0 → 1.17.0
- Lifecycle 2.8.7 → 2.10.0
- Coil rc → stable
Phase 2 - Medium Risk
- Dagger Hilt 2.54 → 2.57.2
- Compose BOM 2025.05.01 → 2025.09.01
Phase 3 - Potential for breaking changes
- Kotlin 2.1.0 → 2.2.x
- AGP 8.7.3 → 8.13.x
At Avvy, we use KMM (Kotlin Multiplatform Mobile) to share the UseCase and Repository layers between Android and iOS. Therefore, when upgrading Kotlin or AGP versions, we need to consider the support status on the KMM side, which makes it difficult to update to the latest versions easily. This is why Phase 3 is left for last.
Future Automation
Once the old libraries are cleaned up, I plan to implement the following automation.
Implementing Renovate
I will introduce Renovate to automatically create PRs for dependency updates.
{
"extends": ["config:recommended", ":dependencyDashboard"],
"timezone": "Asia/Tokyo",
"schedule": ["before 9am on Monday"],
"packageRules": [
{
"matchUpdateTypes": ["patch"],
"automerge": true
},
{
"groupName": "Kotlin",
"matchPackagePatterns": ["^org.jetbrains.kotlin"]
},
{
"groupName": "Compose",
"matchPackagePatterns": ["androidx.compose"]
},
{
"groupName": "Firebase",
"matchPackagePatterns": ["^com.google.firebase"]
},
{
"groupName": "Hilt",
"matchPackagePatterns": ["^com.google.dagger", "^androidx.hilt"]
}
],
"vulnerabilityAlerts": {
"enabled": true
}
}
The key points are as follows:
- Check every Monday morning: Limit checks to once a week to prevent a flood of PRs.
- Auto-merge patch versions: Automate low-risk updates.
- Group related libraries: Combine updates for Compose, Firebase, Hilt, etc., into a single PR.
- Enable vulnerability alerts: Detect security issues immediately.
Changelog Analysis with AI
I also plan to build a system where AI automatically outputs changelogs and affected areas for PRs created by Renovate.
Key Points
- Don't do it all at once: Leave libraries with potential breaking changes for last.
- Determine priority by update volume: Start with those with small differences in minor versions.
- Ensure continuity through automation: Cleaning up once is meaningless if things become outdated again.
4. Performance Monitoring - Detecting Performance Degradation
Problem
Performance issues are often difficult to notice during routine development.
And the worst-case scenario is the pattern of "only noticing through user reports."
- Receiving a review on the App Store saying, "The app has been slow lately."
- Seeing a post on X asking, "Is Avvy feeling heavy?"
- Discovering an increase in the jank rate on Play Console Vitals, but not knowing when it started or what caused it.
It is too late to respond after users have pointed it out. We need to establish a system where developers can notice issues themselves.
Solution
We are building a comprehensive performance monitoring foundation.
Phase 1: Foundation
- Baseline Profiles (AOT optimization)
- Macrobenchmark module (for CI measurement)
- Verification of Looker Studio dashboards
Phase 2: Expanded Observability
- Firebase Performance traces (critical user flows)
- JankStats (detecting frame drops in Compose)
Phase 3: CI Integration
- Integrating Macrobenchmark into GitHub Actions
- Detecting regressions on emulators
Phase 4: Dashboard & Scoring
- Embedding Looker Studio dashboards into Notion
- Performance scoring system
Current Status
The implementation of Firebase Performance has been completed, and we are proceeding with the subsequent phases sequentially.
Key Points
- You can't improve what you can't measure: Start with visualization.
- Integrate into CI: Detect regressions before release.
- Scoring: Quantifying performance changes the awareness of the entire team.
Common Philosophy
The following mindset is common to these four strategies.
1. Design with the assumption that "humans forget and overlook things"
No matter how talented an engineer is, they may forget quality checks when they are busy. That is why it is important to create a mechanism where it is okay to forget.
2. Integrate into CI/Automation to ensure continuity
Even if things are cleaned up once, they will return to the previous state without a mechanism. By integrating these processes into GitHub Actions, you can continuously monitor quality.
3. Do not slow down development speed
It is counterproductive if development stops for the sake of quality control.
- Test only critical screens instead of all screens.
- Update step-by-step based on risk.
- Delegate routine tasks to AI.
4. Utilize AI to reduce the burden on humans
AI is not perfect, but by delegating routine tasks or the first steps of analysis, humans can focus on more important decisions.
Summary
The important thing is to create a "mechanism where quality is maintained even if humans forget."
The keyword for all four strategies introduced this time is "automation."
- Screenshot Testing → Automatically detect UI changes
- Auto Crash Fix → Automatically suggest fixes
- Library Update Management → Automate updates
- Performance Monitoring → Automatically detect degradation
Timing of Quality Checks
With these measures, quality checks are automatically executed at the following timings in Avvy.
| Timing | Execution Details |
|---|---|
| At PR creation | Unit tests, UI tests (Compose + Robolectric), Screenshot Test |
| Every Monday | Renovate (Library update PRs), Auto Crash Fix (Crash fix PRs) |
| Continuous | Firebase Performance (Performance measurement) |
Since Avvy has a full Compose architecture, UI tests can be executed at high speed with Robolectric. The fact that the CI runs lightweight without requiring an emulator is a major advantage.
It is a system where quality checks continue to run continuously without developers having to be conscious of it.
The Combination of GitHub Actions + Claude Code is Powerful
What I felt through these initiatives is that the combination of GitHub Actions and Claude Code is extremely powerful.
- GitHub Actions: Schedule execution, trigger management, secret management, PR creation
- Claude Code: Code analysis, fix suggestions, issue creation, making decisions based on context
GitHub Actions handles "when and what to execute," while Claude Code handles "how to analyze and fix." This division of labor has made it possible to automate part of the intellectual work that previously could only be done by humans.
While some measures are still in progress, I believe that by advancing automation step by step, we can continue to provide high-quality apps.
Discussion