iTranslated by AI
My AI Partner #3: Designing for Safety and Clear Decision-Making through Responsible Delegation
An attempt to implement "Safe-Stop Design" and "Separation of Concerns" with GitHub Copilot Agent Mode
──
Writing collaboration: Tech Lead BOSS / Part 3

When I started building my AI companion, the first question I faced wasn't "What can it do?"
Will it stop when I need it to stop?
There was convenience, certainly.
But every time I used it, I had to keep an eye on it from the back of my mind, wondering, "Is it going to be okay this time?"
That didn't make it a companion; it made it a high-performance tool that required constant supervision.
I realized that unless I solved this, I wouldn't be able to use it for long. In this installment, I'll write about how I designed a structure that allows for stopping, and the subsequent issue of "who decides what."
The problem wasn't "What it can do," but "Where it should stop"
When I first started building it, I had three questions in mind:
- Can it stop itself when the premises are ambiguous?
- Can it enter an approval-pending state before dangerous operations or crossing boundaries?
- Does adding stop conditions make operation too complex?
The third point was particularly tough. The more rules I added to force a stop, the more I had to verify the set of rules themselves each time. The design meant to make it stop brought about a different kind of complexity. I couldn't say that increased controllability.
In one task, my first request to the companion wasn't code generation, but the enumeration of commands requiring approval and the clarification of permission boundaries. The consultation on "where to return to an approval-pending state" took precedence over "what to have it do." It became clear then that what I was most concerned about wasn't "speed," but "where to stop it."
Place the "Safe-Stop Design" before execution
What I changed wasn't just the focus of tool selection; it was the order of the design itself.
Previously, I would look at "what it could do" and add "where to stop" later.
But what I really needed was the opposite.
First, decide where to stop. Only place execution inside that boundary.
In fact, I placed a subtle but effective statement at the beginning of the companion's rule document as well.
- Immediately stop execution if an uninstalled command appears
- Proceed to the next execution only after obtaining approval
- Do not guess or hard-code internal implementations of external systems
- Always return to verification if a decision is not based on facts
For a companion, the value of stopping reliably when things get risky is higher than running smoothly.
Writing the approval boundaries first makes the companion suddenly professional
After detecting behaviors that felt like running out of control, the first thing I did in the GitHub Copilot design wasn't to pile on more capabilities. It was to clearly write out the approval boundaries and stop conditions.
I decided on these three things first:
- What the companion can proceed with on its own
- What must stop at the proposal stage, waiting for human approval
- When to return to verification once ambiguous premises or boundary crossings are detected
Designing in this order means the companion is no longer an "entity that does anything." Instead, it becomes "an entity that can move quite confidently within the allowed scope."
A companion that can stop as intended. This was probably what I wanted from the very beginning.
However, the next problem emerged here. Even if I wrote down the stop conditions, if who decides what remains ambiguous, the operation starts to clog up again.
Designing from the organizational chart instead of code
That was when I expanded the design scope one level further.
The first thing I did was not to list in detail what to implement for the agent. Instead, I placed a structure defining who makes the judgments, who executes, and who holds final responsibility at the center.
If you start from the idea of "what is needed to make this function work," the structure ends up being pulled by the features. If you try to separate responsibilities later, it becomes a task of peeling apart things that are already mixed. Conversely, if you start from the organizational chart, code becomes a means to implement those responsibilities. This order is less prone to collapse later.
Initially, I ran it in two layers
That said, it wasn't this form from the start. Initially, it was quite simple. The human held final approval, the decision-maker organized the requests, and connected directly to each execution role. I ran it that way for a while.
I thought this was enough. The decision-maker organizes the requests and passes them to the necessary person in charge. It looked clean. However, once I started running it, another type of burden appeared.
Three weeks later, I realized it wasn't being delegated
The problem was that the decision-maker held both "making the judgment" and "deciding which execution role to pass it to and how."
Whenever a request became slightly complex, the following breakdown would occur each time:
- This involves both the design role and the implementation role
- It is better to extract only the investigation first
- It is better to interject documentation work before implementation
Moreover, I was conscious each time about "where is the correct place to pass this case?" Rather than the judgment itself, the hanging burden of this distribution judgment every time was heavy.
I intended to delegate, but the human and the decision-maker were still stuck thinking about distribution.
At that moment, I realized that the line defining "who decides what" was still insufficient.
The delegation role emerged for responsibility separation, not for efficiency
That is when I added a role dedicated solely to delegation and partitioning later on. The rules established after the design solidified contain a very strong statement.
BOSS makes only two choices: delegate or do it yourself. It is the job of the delegation role to decide which Worker to use and how to partition tasks; BOSS does not intervene.
With this single sentence, the structure suddenly became professional.
The important thing is that the purpose of adding a delegation role was not just "to make things faster." It was to avoid mixing value judgment, distribution judgment, and execution judgment.
| Role | Decides | Does Not Decide |
|---|---|---|
| Human | Priority, value judgment, final approval | Details of execution procedures |
| Decision-maker | Organization of requests, whether to delegate | Concrete partitioning methods |
| Delegation role | Task partitioning, who to pass to | Value judgment, final approval |
| Execution role | Execution and proposal of assigned scope | Meaning of the overall request |
The delegation role was not a convenience feature, but a component to fill the gaps in responsibilities.
The meaning of "deciding beforehand" was not knowing the final form
"Deciding beforehand" here does not mean I had the completed organizational chart from the start.
To be precise, it is closer to having decided beforehand what must not be mixed.
- Do not mix value judgment and execution
- Do not mix final approval and task partitioning
- Do not mix the responsibility held by humans and the processes that can be passed to AI
Because this policy existed beforehand, when I ran the operation, I was able to notice, "this role is missing here." If there had been no policy, I think it would have ended with just feeling slow, difficult to use, and clogged up.
When you decide on a structure beforehand, it becomes easier to see deficiencies in operation. The fact that the delegation role emerged later is not proof that the design failed. It meant that because there was a policy of responsibility separation, I could discover the necessity within the operation.
Next time, I will write about what the companion, with its structure now built, was actually doing in daily operations. The story will be that it was not just code generation.
By the way, my companion also helped with the structure of this article. The first draft started mixing "stop design" and "responsibility separation," and was exactly the case where delegation was not happening. It reproduced the reason why an organizer role was needed, outside of the main text.
Discussion