iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔐

Nurturing the Keys Before the Pet: Building a Secure Foundation for an AI Pet — Phase 0

に公開

Introduction

I never expected that the very first code I would write after deciding to "build an AI pet" would be unit tests for Argon2id.

I am not sure if this is the destiny of a modern architect or just an occupational disease for someone who has spent too long in the network security industry. Regardless, my conclusion for now is: to protect the soul of a pet, you must first raise the keys. It feels strangely on point, so I have decided to proceed with this approach for a while.

I am working on an Android app, currently codenamed "Digital AI Pet (Tentative)." The final name is not yet decided. I am the type who prefers to think while things are running.

By the way, I have no experience in Android app development. My main profession is in network security, focusing on Zscaler, and when I do write code, it is mostly Python server-side. I am taking on Kotlin, Compose, and Gradle almost for the first time. Phase 0 is the record leading up to that "Hello."

Why I insist on "Device-only"

Several cloud-based AI pet apps already exist. However, what I want to create is not an existence where "the pet's memories continue to live on someone else's server forever after the owner passes away." That is not a soul; it is closer to a curse of data.

The essence of ZeroTrust is often summarized as "never trust, always verify," but that is a bit too simplistic. The truth is more like "access is always verified, and trust is earned with every access."

Applying this directly to personal data architecture, even the cloud I operate is no exception. If data is sent to the cloud, the user should grant permission each time. Treating "once agreed, therefore send forever" as consent is not trust; it is inertia.

Based on that design philosophy, I set "The soul lives on the device" as the first principle for this pet. The cloud provides only insurance and catalysts. I have dubbed this the "Island and Lighthouse Model."

[Device (Island)] Pet's core — Memory, personality, dialogue, and inference live here
       ↑↓ (Limited communication)
[Cloud (Lighthouse)] Encrypted backup + Catalyst info only

What I built in Phase 0 — The three-layer key structure

In Phase 0, I implemented three encryption layers to show "Hello" on the screen. The order is important, as it directly reflects the hierarchy of the design.

Layer Technology What it protects
① Key Derivation Argon2id (Bouncy Castle) Passphrase → 32-byte key
② Key Storage Android Keystore + StrongBox Protects keys with device hardware
③ Persistence SQLCipher + AES-256-GCM (Room) Encrypted DB

Why this order? Passphrases are about memory (human vulnerability), Keystore is about hardware (physical boundaries of the device), and SQLCipher is about persistence (boundary of the time axis). Each defends against a different type of "leakage," and since each layer is independent, they function as defense in depth. Defense in depth is a classic approach, but there is a reason it is a classic.

Argon2id parameters — And verifying my own hallucinations

Initially, I made my implementation plan thinking, "I seem to remember OWASP saying memory=256MB, iterations=3, parallelism=4," wrote the code, and even wrote tests. I wrote it that way in the first draft of this article as well. After receiving feedback and checking primary sources just in case, I realized it was a complete hallucination.

The OWASP Password Storage Cheat Sheet currently lists five equivalent configurations. The minimum line is m=19 MiB, t=2, p=1, and even the heaviest configuration is m=47 MiB, t=1, p=1. Nowhere does it say 256 MiB.

Come to think of it, even if long-term storage is the premise, the OWASP setting of "hashing a large number of user passwords every time on the server side" should not be too strict for key derivation for a single client. I was just mindlessly lining up stereotypes like "OWASP must be strict" and "smartphones must be weak" to create a conflict structure that felt comfortable. It is pathetic.

Here are the final implementation values:

val params = KdfArgon2id.Params(
    memoryKb = 65_536,    // 64 MiB — A preset exceeding the OWASP heaviest configuration of 47 MiB
    iterations = 2,
    parallelism = 4,
    keyLengthBytes = 32,
)

With memory exceeding the OWASP heaviest configuration (47 MiB) and a parallelism of 4, it takes a few hundred milliseconds on an Xperia 1 VII. Since the premise is the protection of long-term stored data, I can now explain that I left some overhead rather than choosing the minimum. I could not have explained that during the first implementation.

Lesson: Do not trust the numbers in AI-written design documents (or your own) until you have verified them with primary sources. As long as you are dealing with security standards, this is a habit that must be followed. In technical articles, the more plausible a number looks, the more likely you are to have an accident if you don't cross-check it with primary sources.

StrongBox works silently

The Xperia 1 VII (Snapdragon 8 Elite) I used for verification supports StrongBox. You can verify this with pm list features, where feature:android.hardware.strongbox_keystore=300 appears. When passing KeyGenParameterSpec.Builder.setIsStrongBoxBacked(true), the Keystore silently creates the key in the hardware area. If it fails, a StrongBoxUnavailableException is thrown, so I can just fall back to the TEE.

return try {
    buildKey(alias, useStrongBox = true)
} catch (e: StrongBoxUnavailableException) {
    buildKey(alias, useStrongBox = false)
}

I was scolded by detekt for this fallback with a SwallowedException, but since this was an intentional suppression, I added it to the whitelist in detekt.yml. It is my principle to write a fallback wherever I can.

SQLCipher's loadLibs was dead

My implementation plan said to call net.zetetic.database.sqlcipher.SQLiteDatabase.loadLibs(context). When I actually ran it, I was told that such a method did not exist.

Upon investigation, this was an API from the old package net.sqlcipher:android-database-sqlcipher, and in the new package net.zetetic:sqlcipher-android:4.6.1, it had changed to System.loadLibrary("sqlcipher"). You should not trust design documents too much. A design document is also a collection of wishful thinking.

The Trap of Android Gradle Plugin 9.x

The most time-consuming part of Phase 0 was not the three-layer encryption, but understanding the behavior of Android Gradle Plugin 9.x. Getting "Hello" on the screen was a matter of a few hours of trial and error, but I was blocked repeatedly leading up to that.

The first roadblock was the fact that I needed to bump to AGP 9.1 to use compileSdk 37 (Android 17), as AGP 8.7 was insufficient.

I intellectually understood that AGP versions are tied to compileSdk limits, but until I encountered it myself, I naively assumed that "latest compileSdk = latest AGP." It turns out that the latest Android is not always supported by the latest AGP. Google gave me a thorough education on that.

Here is the stack I finally settled on:

agp = "9.1.1"           # Supports compileSdk 37
gradle = "9.3.1"        # Required by AGP 9.1.x
kotlin = "2.2.10"       # Consistent with the KGP bundled with AGP 9.1.x
ksp = "2.2.10-2.0.2"    # Version compatible with Kotlin 2.2.10

The kotlin-android plugin disappeared in AGP 9.x

Kotlin compilation is now built into AGP 9.x. If you apply alias(libs.plugins.kotlin.android) in an Android module, you get:

Cannot add extension with name 'kotlin', as there is an extension already registered with that name.

While the kotlin-compose plugin is still necessary, kotlin-android is not only unnecessary but actually harmful. The kotlinOptions { jvmTarget = "21" } DSL block has also been removed. In its place:

import org.jetbrains.kotlin.gradle.dsl.JvmTarget

kotlin {
    compilerOptions {
        jvmTarget.set(JvmTarget.JVM_21)
    }
}

Searching online returns a flood of outdated information (telling you to apply kotlin-android), which is confusing. Since AGP 9.0 GA is from January 2026, the global pool of knowledge is still thin. I ended up in the position of being an early adopter—or perhaps a human sacrifice.

Strict inclusion in Gradle 9.x

If you write include(":core:crypto") in settings.gradle.kts, Gradle 9.x strictly checks for the existence of the corresponding directory. If it doesn't exist, the sync fails. It no longer permits including modules that you "plan to create later."

The workaround is to place a placeholder like core/crypto/.gitkeep beforehand, but it's the kind of trap that really hampers you if you don't know it.

DEX checks the content of backticks

When I gave a Kotlin test method a Japanese name enclosed in (), it crashed during the D8 dexing process:

Method name '暗号化結果は毎回異なる(IV ランダム性)' in class '...'
cannot be represented in dex format.

Kotlin allows you to write almost anything inside backticks, but the DEX format does not permit ( ) ; [ ] / . < >. I solved this by replacing them with _. It is a strange specification where Japanese identifiers are allowed, but symbols are strictly forbidden—a moment where I felt the gap between Kotlin's flexibility and the rigidness of the JVM/DEX.

The Engineering Experience of Being Taught Twice by AI

I proceeded with the implementation using a sub-agent-driven development flow with Claude Code. By assigning a new sub-agent for each task and running a two-stage review (specification compliance -> code quality) after implementation, I observed several interesting phenomena.

Claude Code itself makes mistakes

Particularly regarding the AGP issue mentioned earlier, Claude Code had set agp = "8.7.0" in the initial implementation plan. I accepted it without a second thought, which is something I reflect on personally.

Once we entered the implementation phase, the sync failed. The sub-agent acting as code reviewer pushed it back, stating, "8.7 does not support compileSdk 37; I checked the AGP compatibility matrix, and 9.1 is required."

So when I instructed Claude Code to "bump it to 8.11," the implementation sub-agent pushed back again: "8.11 only supports up to 36. After checking Maven Central, 9.1.1 is currently the only option."

In other words, Claude Code wrote AGP 8.7 in the planning stage, wrote 8.11 in my correction instruction, and ultimately, the implementation sub-agent proposed 9.1.1, which was adopted. It is a configuration where humans are taught mistakes by AI, and AI is taught mistakes by AI.

The same pattern occurred with the KSP version. The plan initially had 2.1.0-1.0.29, which was rewritten to 2.2.10-2.0.4 when I updated Kotlin to 2.2.10, but the implementation agent corrected it, saying, "2.0.4 does not exist on Maven Central for Kotlin 2.2.10; 2.0.2 is the correct combination."

The SQLCipher loadLibs issue was similar; Claude Code had written the old API in the plan and swapped it for the new API during the implementation phase after checking the web. "Design documents are a collection of wishful thinking" is a phrase I thought of at that exact moment.

Reviewer's web searches provide support

Honestly, it is exhausting for a single human to track all the latest specifications for AGP 9.x. Only a few months have passed since AGP 9.0 GA, and there is almost nothing on Stack Overflow. Google's official documentation is comprehensive, but it lacks the concreteness of "what happens when you actually try this."

This is where the sub-agents shone, as they would directly query Maven Central, fetch AGP release notes via WebFetch, or read the JetBrains blog to confirm version compatibility when necessary. I just observed the process from the side as the answers converged.

Probably only people inside Google have the compatibility of "latest Android," "latest AGP," and "latest Kotlin" all stored in their heads. But an AI that goes to fetch the information in real-time can get pretty close.

Code review pushbacks accelerate thought

Because Claude Code automatically rotates the cycle of implementation -> specification compliance review -> quality review -> correction -> re-review, I only need to intervene in judgment and decision-making. The bug that caused the most pushbacks this time was a "task ordering bug" where app:assembleDebug failed to resolve :core:crypto.

The plan stated to write implementation(project(":core:crypto")) in Task 2, but the core:crypto module itself was not created until Task 6. The order was reversed. The reviewer pointed out, "./gradlew :app:tasks passes, but :app:assembleDebug does not, so the verify command should be changed to assembleDebug, and the project dependency should be commented out in Task 2." This led to a flow where the plan itself was updated.

This is a bug I could have prevented if I had noticed it during the planning stage, but it is also a truth that plans only reveal their holes once you write them and try to run them. Do not expect the experience of having a perfect plan and perfect implementation. I want to say I didn't expect it, but I did—at least halfway.

Phase 0 Results

  • 19 commits (including a few correction commits)
  • 3 rounds of two-stage review cycles (AGP/Kotlin/Koin, AAPT themes, KSP compatibility)
  • Passed all 9 tests on Xperia 1 VII (3 Argon2id unit, 3 Keystore instrumented, 3 SQLCipher instrumented)
  • "Hello" is displayed on the screen
  • Tagged phase0-foundation

My honest feeling is: "Does it really take this much foundation just to put 'Hello' on the screen?" Regular Android development articles start by displaying Hello in the first 5 minutes, but since I started with security, the order was reversed. I couldn't reverse it, or rather, I didn't want to. Build the vessel before pouring the soul into it. This is non-negotiable.

Next Preview

Phase 1: Conversation Core

  • JNI integration of llama.cpp
  • Loading Qwen 2.5 3B Instruct (Q4_K_M) on-device
  • Basic Compose chat UI
  • Getting to the state where you can "talk to your pet" on-device

I expect the part about running a local LLM on a smartphone will be even more grueling than the key management in Phase 0. I hope you'll keep a warm, watchful eye on the progress next time too.


To love my pet, I first raised the keys. I think I'll continue with the next steps.

Note that the code repository for this project is not public. The real-time development updates and sharing the points where I got stuck on Zenn will be my outward output. A sub-theme of the articles is how far an active Zscaler professional can implement their "personal security philosophy" in a side project.

Discussion