iTranslated by AI
How I stopped being called 10 times a day by the AI I asked to 'notify me if you fail'
How asking an AI to "call me if you fail" led to being called 10 times a day
During the first month of using an AI to write bot code for my side project, I had it set by default to "stop and call a human if an error occurs."
For about three weeks, I was getting called roughly 10 times a day.
I’d brew coffee in the morning and get called. I’d return from lunch and get called. I’d be watching TV at night and get called. I’d check my phone before bed and get called.
When I opened the logs, most of them were things that could be fixed in five seconds, such as "ImportError," "missing type hints," or "typos in test fixture names."
Shouldn't I just be letting the AI handle this itself?
Once I thought that, I wrote down rules for how to behave when errors occurred, and the number of times I was called dropped significantly. I'm recording what happened when I started using AI for my side project here.
The reason I had "call me after one failure" as the default
The reason I set it to "stop and call a human if an error occurs" was that I was afraid the AI would fix things incorrectly on its own, causing a disaster.
I’m running a bot for a side project. When something went wrong, I wanted to see it with my own eyes. I felt that if it fixed things on its own, I wouldn't be able to track what had happened.
As a result, I ended up reviewing every single error, even those that took five seconds to fix.
- Missing import → Add the import
- Missing type hint → Add the type
- Typo in test fixture name → Fix it
- Missing await in asynchronous processing → Add the await
I was the one typing "try again" every single time. Even though I was asking an AI to build the bot for me, I spent those three weeks acting as the AI's hands and feet.
My desires to "check it myself" and "not want to fix everything myself" were contradictory, but it took me three weeks to notice that contradiction. I was being called 10 times a day, and at the time, I just thought, "Well, I guess this can't be helped."
There were times when I couldn't even understand it after looking at git diff
Toward the end of those three weeks, there was a time when the bot stopped mid-rewrite.
I thought, "It called me again," as usual, and opened the logs, but everything was in a strange state. I tried to revert to the pre-fix state, but I couldn't tell "how far the rewriting had finished."
I looked at the diff in git, but the boundary between the AI's changes and the parts I had manually fixed had become blurred. Every time the AI called me, I had made a small manual adjustment, so the git log was filled with alternating "AI commits" and "commits after my manual fixes." I didn't know which state to revert to.
I ended up spending about an hour in a state of "I just want to revert it, but I have to start by finding what to revert." The feeling of time melting away just by trying to revert files is not a good one.
That was the first time I realized, "It's actually more important to prevent this kind of situation from happening in the first place." In that hour, I finally understood the meaning of creating a revertible state before thinking about how to revert it.
Giving up on typing "try again" every time
What I gave up on after three weeks was the operation of "having a human look at it after one failure."
Checking every single time an error occurred was simply not sustainable. I think I was making the situation complicated by trying to maintain an unsustainable operation and looking at every single error, even those that took five seconds to fix.
I scribbled down some rough rules.
If an error occurs, first analyze the cause. Retry up to two times using the same approach. If that fails, switch to a different approach. Only call a human if all three approaches fail.
I simply changed "call me after one failure" to "call me if all three approaches fail."
Before I wrote these down, I was afraid the AI would fix things incorrectly on its own, but the current situation where I was typing "try again" every time likely had a higher failure rate. In the sense that we were just fixing it and retrying—whether I did it manually or the AI did it—it was the same thing.
I also had to stop "doing the same thing three times"
Immediately after I implemented the retry rules, another problem arose. The AI sometimes entered a loop, retrying the same error three times.
curl failed → curl with a different URL → curl with modified headers → curl with another URL again...
It was clearly looping through things that wouldn't work. Since I had only written the retry rules, I hadn't specified that it should "give up if it fails twice with the same approach."
So, I added rules for switching approaches as well.
- If CLI doesn't work, try API.
- If API doesn't work, try directly from Python.
- If all of those fail, ask a human for "alternative methods."
Writing only one of either retry rules or anti-looping rules causes the system to break. It felt like both were necessary to finally make it work. This, too, couldn't be written perfectly on the first try; I had to add to it while getting stuck several times.
Deciding in advance what should not be retried
After adding the retry rules, one other thing I added was a "do not retry" list.
- Authentication errors (missing API keys, expired tokens)
- Failures requiring design decisions (specifications are ambiguous, cannot implement)
- Cases requiring destructive DB schema changes
- Already failed twice with the same error
Especially with the first one, I kept hitting it until I wrote it down. I would have it loop retries even when the API key was missing, and only realize "there's no key" after spending time on three different approaches. I did this several times.
You have to write down "this won't be fixed by retrying" in advance.
Ever since I wrote that, I stopped getting stuck in those ways. When I see an error, I judge, "This is an authentication issue," and escalate immediately; otherwise, I just retry.
Using timeout as a "boundary for human intervention"
What I got stuck on with timeout design was writing it with the mindset of "if time runs out, consider it a failure."
There are two reasons why AI processing takes a long time.
One is when the process is genuinely heavy. It will finish if you wait long enough. For this, it was easier to run it in the background and receive a "notification upon completion" rather than setting a timeout. Instead of "fail if it doesn't return in 30 seconds," it's "let me know when it's done."
The other is when the process is structurally blocked. It’s requesting authentication information in a loop, it's blocked waiting for human confirmation, or it can't proceed because a design decision is needed. This is an immediate escalation.
A timeout is not a "failure judgment," it is a "trigger for human judgment."
I realized this after a few cases where I received a "failed" notification due to a timeout, only to open the contents and find it was just "the API key wasn't set." It’s much clearer to see "cannot proceed because the API key is not set" than just "failed," as it tells me what to do next.
Rollback means "creating a revertible state first"
After that incident where I spent an hour lost in file rewrites, I only wrote down two rules:
- Commit immediately after tests pass (do not wait for the entire implementation to finish).
- "Irreversible changes," like DB schema changes or production data modifications, require human confirmation no matter how small.
I simply made only revertible changes subject to autonomous execution and kept irreversible changes out of it.
"Committing before committing" sounds like weird Japanese, but what I’m actually doing is "creating granular points in the git log that I can return to." Without this, you lose track of "how far you've finished" when the process stops in the middle. You look at the git diff, can't tell the difference, and end up spending an hour reverting. My motivation was that I didn't want to repeat that kind of morning over and over again.
You can tell just by looking at the "number of calls"
The number of calls dropped significantly once I wrote these down and started operating with them.
I haven't counted the exact numbers. However, I can feel that I now have some time between brewing my morning coffee and opening Slack. Before, "get called as soon as I brew coffee" was the routine, and work started before I even drank the coffee.
the content of the calls has also changed. Before, "missing import" or "missing type hint" were common, but now they are mostly "design decision needed" or "please update authentication info." It feels like only things that require my specific judgment remain.
It is impossible to have AI "do everything." There are parts that only I can judge. But it took me three weeks to realize that I didn't need to look at everything else, even simple ImportErrors that take five seconds to fix, every single time.
Somewhere along the way, it became a habit to think, "I hope I don't get called today" while brewing my coffee in the morning. I don't know if that's meaningful.
I hear stories about the same symptoms
Recently, I've been hearing similar stories little by little.
- "I'm woken up every time by 'failed' notifications from AI."
- "I get called for every minor code error, and my side project time is completely drained."
- "I feel like I've become the AI's hands and feet."
I felt the same way for three weeks while being called 10 times a day. Because I set it that way with the feeling of "I want to verify it myself," it was I who created that state. It wasn't the AI that decided I should review every single error that takes five seconds to fix; it was me.
If you don't decide how things should break in advance, you'll eventually end up being called for everything. It took me three weeks to realize that, so I suspect others might be running in the same state for a while too.
I just don't want to be called in the morning
Retry rules, anti-looping, no-retry lists for authentication, timeout usage, and how to create revertible states—I keep them all on one page for myself.
It's not really about wanting to do reliability engineering or building a reliable architecture; my honest motivation is just that I don't want to be called in the morning. I don't want to write about fancy design theories.
If someone wants to read it, I could release it as a paid article, but the contents are just rough rules and a breakdown of errors by type. It's not a polished design document.
I don't think this is the final answer. Probably, I'll encounter "being called too much" again in another place. I'll probably stumble again. But for now, it has decreased a little from 10 times a day. It's about as much of a change as coffee staying hot a little longer. Rather than wanting to design, I just don't want to be called in the morning.
Discussion