iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤖

Automation Strategies to Never 'Forget' Android App Quality

に公開

Introduction

I am in charge of Android development for the live streaming app "Avvy," which was released this year. While proceeding with feature development at breakneck speed, I faced a certain problem.

"The UI of the streaming screen is slightly misaligned."

When a new feature was added, existing button positions shifted subtly. When a component was modified, the layout broke on another screen. Changes that weren't noticed during code review were coming to light.

What these problems have in common is that "humans forget and overlook things."

In the midst of rapid development, it is not realistic to perform quality checks manually every time. That is why it is crucial to create a "never forget" mechanism through automation.

In this article, I will introduce four quality automation strategies that I worked on in 2025 (and am currently working on).

Four Automation Strategies

Strategy Problem it solves
Screenshot Testing Overlooking visual changes in UI
Auto Crash Fix Postponing crash fixes
Library Update Management Accumulation of technical debt
Performance Monitoring Not noticing performance degradation

1. Screenshot Testing - Never Missing UI Changes

Problem

UI changes are easy to overlook with the human eye.

  • Padding changed by 1dp
  • Button positions shifted slightly
  • Unintended elements overlapping

Especially when developing with Jetpack Compose, changes to a component can affect unexpected places. In fact, a bug once occurred where the comment section hid the gift button.

Solution

I introduced the official Android Screenshot Testing to automate Visual Regression Testing.

@Test
fun streamingScreen_displaysCorrectly() {
    composeTestRule.setContent {
        StreamingScreen(/* ... */)
    }
    composeTestRule.onRoot().captureRobolectricImage()
}

Narrowing down the states to be tested

In streaming screens, there are many cases where the UI is displayed or hidden dynamically. Since testing all states would cause maintenance costs to explode, I adopted a policy of creating golden images with a Preview where all UI elements are displayed.

State machine for Screenshot Test

This allows us to detect important UI changes while avoiding frequent test updates.

Generating golden images on CI

What is important in Screenshot Testing is the stability of the tests. If you generate golden images in a local environment, differences in the environment (font rendering, OS version, etc.) can cause discrepancies on CI.

Therefore, I make sure to generate golden images on GitHub Actions as well. This ensures that the CI environment and the golden image environment match, achieving stable testing.

Key Points

  • Focus on important screens: Limit tests to critical screens like Streaming and Viewer, rather than every single screen.
  • Test all-UI-displayed states: For UI that changes dynamically, create golden images with everything visible.
  • Generate golden images on CI: Prevent discrepancies due to environmental differences and ensure test stability.

2. Auto Crash Fix - AI Automatically Fixes Crashes

Problem

In a small, elite team, we often focus on feature development. Even if crash reports arrive in Firebase Crashlytics, it is difficult to prioritize them if the occurrence frequency is low.

However, crash fixes often only require minor adjustments, such as adding null checks or exception handling.

Therefore, by delegating these routine fixing tasks to AI, we created a system where developers can focus on feature development while still making progress on crash resolution.

Solution

We use Claude Code to automate everything from crash analysis to creating a fix PR.

Mechanism:

Every Monday 09:00 JST (GitHub Actions)

Get top crashes from Firebase Crashlytics

Analyzed by Claude Code

Automatically generate fix PR

Create and track issue in Linear

Workflow Overview:

name: Weekly Crashlytics Auto Fix
on:
  schedule:
    - cron: '0 0 * * 1'  # Every Monday 00:00 UTC
  workflow_dispatch:
    inputs:
      dry_run:
        description: 'Dry run mode'
        type: boolean
        default: true

Prompt Design Strategies

In the prompts passed to the AI, we incorporate the following strategies:

1. Duplicate check for existing issues

To prevent duplicate issues from being created for the same crash, we search Linear by the Crashlytics Issue ID before creation.

2. Focus on minimal fixes

We explicitly state "Minimal changes only - Fix the crash, don't refactor surrounding code" in the prompt to prevent the AI from making excessive changes.

3. Mandatory build and test success

We require passing ./gradlew assembleDebug and ./gradlew testDebugUnitTest before creating a PR. If the build fails, the PR creation is skipped.

4. Leave uncertain fixes to humans

By including "If unsure about the fix, create the PR with a clear note asking for guidance," we ensure that the AI requests human review for fixes it is not confident about.

Key Points

  • Dry run mode: Don't apply to production immediately.
  • Human review mandatory: Always review PRs created by AI.
  • Linear integration: Ensure traceability.
  • Build/Test mandatory: Do not create PRs if CI does not pass.

3. Library Update Management - Preventing the Accumulation of Technical Debt

Problem

Library updates are often postponed with the mentality of "it's working now, so later." Without periodic inventory, they become outdated before you know it.

Current Inventory

First, I visualized the current status of major libraries.

Library Current Latest Priority
Kotlin 2.1.0 2.2.x 🔴 High
AGP 8.7.3 8.13.x 🔴 High
Compose BOM 2025.05.01 2025.09.01+ 🟡 Medium
Dagger Hilt 2.54 2.57.2 🟡 Medium
Navigation Compose 2.9.0 2.9.6 🟢 Low
Lifecycle 2.8.7 2.10.0 🟢 Low
Coil 3.0.0-rc01 3.x stable 🟡 Medium
Core KTX 1.15.0 1.17.0 🟢 Low

Risk-Based Step-by-Step Updates

Priority is determined by the "volume of the update" and the "possibility of breaking changes."

Phase 1 - Low Risk (Small update volume)

  • Navigation Compose 2.9.0 → 2.9.6
  • Core KTX 1.15.0 → 1.17.0
  • Lifecycle 2.8.7 → 2.10.0
  • Coil rc → stable

Phase 2 - Medium Risk

  • Dagger Hilt 2.54 → 2.57.2
  • Compose BOM 2025.05.01 → 2025.09.01

Phase 3 - Potential for breaking changes

  • Kotlin 2.1.0 → 2.2.x
  • AGP 8.7.3 → 8.13.x

At Avvy, we use KMM (Kotlin Multiplatform Mobile) to share the UseCase and Repository layers between Android and iOS. Therefore, when upgrading Kotlin or AGP versions, we need to consider the support status on the KMM side, which makes it difficult to update to the latest versions easily. This is why Phase 3 is left for last.

Future Automation

Once the old libraries are cleaned up, I plan to implement the following automation.

Implementing Renovate

I will introduce Renovate to automatically create PRs for dependency updates.

{
  "extends": ["config:recommended", ":dependencyDashboard"],
  "timezone": "Asia/Tokyo",
  "schedule": ["before 9am on Monday"],
  "packageRules": [
    {
      "matchUpdateTypes": ["patch"],
      "automerge": true
    },
    {
      "groupName": "Kotlin",
      "matchPackagePatterns": ["^org.jetbrains.kotlin"]
    },
    {
      "groupName": "Compose",
      "matchPackagePatterns": ["androidx.compose"]
    },
    {
      "groupName": "Firebase",
      "matchPackagePatterns": ["^com.google.firebase"]
    },
    {
      "groupName": "Hilt",
      "matchPackagePatterns": ["^com.google.dagger", "^androidx.hilt"]
    }
  ],
  "vulnerabilityAlerts": {
    "enabled": true
  }
}

The key points are as follows:

  • Check every Monday morning: Limit checks to once a week to prevent a flood of PRs.
  • Auto-merge patch versions: Automate low-risk updates.
  • Group related libraries: Combine updates for Compose, Firebase, Hilt, etc., into a single PR.
  • Enable vulnerability alerts: Detect security issues immediately.

Changelog Analysis with AI

I also plan to build a system where AI automatically outputs changelogs and affected areas for PRs created by Renovate.

Key Points

  • Don't do it all at once: Leave libraries with potential breaking changes for last.
  • Determine priority by update volume: Start with those with small differences in minor versions.
  • Ensure continuity through automation: Cleaning up once is meaningless if things become outdated again.

4. Performance Monitoring - Detecting Performance Degradation

Problem

Performance issues are often difficult to notice during routine development.

And the worst-case scenario is the pattern of "only noticing through user reports."

  • Receiving a review on the App Store saying, "The app has been slow lately."
  • Seeing a post on X asking, "Is Avvy feeling heavy?"
  • Discovering an increase in the jank rate on Play Console Vitals, but not knowing when it started or what caused it.

It is too late to respond after users have pointed it out. We need to establish a system where developers can notice issues themselves.

Solution

We are building a comprehensive performance monitoring foundation.

Phase 1: Foundation

  • Baseline Profiles (AOT optimization)
  • Macrobenchmark module (for CI measurement)
  • Verification of Looker Studio dashboards

Phase 2: Expanded Observability

  • Firebase Performance traces (critical user flows)
  • JankStats (detecting frame drops in Compose)

Phase 3: CI Integration

  • Integrating Macrobenchmark into GitHub Actions
  • Detecting regressions on emulators

Phase 4: Dashboard & Scoring

  • Embedding Looker Studio dashboards into Notion
  • Performance scoring system

Current Status

The implementation of Firebase Performance has been completed, and we are proceeding with the subsequent phases sequentially.

Key Points

  • You can't improve what you can't measure: Start with visualization.
  • Integrate into CI: Detect regressions before release.
  • Scoring: Quantifying performance changes the awareness of the entire team.

Common Philosophy

The following mindset is common to these four strategies.

1. Design with the assumption that "humans forget and overlook things"

No matter how talented an engineer is, they may forget quality checks when they are busy. That is why it is important to create a mechanism where it is okay to forget.

2. Integrate into CI/Automation to ensure continuity

Even if things are cleaned up once, they will return to the previous state without a mechanism. By integrating these processes into GitHub Actions, you can continuously monitor quality.

3. Do not slow down development speed

It is counterproductive if development stops for the sake of quality control.

  • Test only critical screens instead of all screens.
  • Update step-by-step based on risk.
  • Delegate routine tasks to AI.

4. Utilize AI to reduce the burden on humans

AI is not perfect, but by delegating routine tasks or the first steps of analysis, humans can focus on more important decisions.

Summary

The important thing is to create a "mechanism where quality is maintained even if humans forget."

The keyword for all four strategies introduced this time is "automation."

  1. Screenshot Testing → Automatically detect UI changes
  2. Auto Crash Fix → Automatically suggest fixes
  3. Library Update Management → Automate updates
  4. Performance Monitoring → Automatically detect degradation

Timing of Quality Checks

With these measures, quality checks are automatically executed at the following timings in Avvy.

Timing Execution Details
At PR creation Unit tests, UI tests (Compose + Robolectric), Screenshot Test
Every Monday Renovate (Library update PRs), Auto Crash Fix (Crash fix PRs)
Continuous Firebase Performance (Performance measurement)

Since Avvy has a full Compose architecture, UI tests can be executed at high speed with Robolectric. The fact that the CI runs lightweight without requiring an emulator is a major advantage.

It is a system where quality checks continue to run continuously without developers having to be conscious of it.

The Combination of GitHub Actions + Claude Code is Powerful

What I felt through these initiatives is that the combination of GitHub Actions and Claude Code is extremely powerful.

  • GitHub Actions: Schedule execution, trigger management, secret management, PR creation
  • Claude Code: Code analysis, fix suggestions, issue creation, making decisions based on context

GitHub Actions handles "when and what to execute," while Claude Code handles "how to analyze and fix." This division of labor has made it possible to automate part of the intellectual work that previously could only be done by humans.

While some measures are still in progress, I believe that by advancing automation step by step, we can continue to provide high-quality apps.

GitHubで編集を提案

Discussion