iTranslated by AI
I Replaced tmux send-keys with Hooks and Nearly Killed My Lead Agent: Modernizing 14 Outdated AI Skills
v3.9 has been released.
In the previous CLAUDE.md hardcore article, I wrote that "every single line of CLAUDE.md is made by accident." This time, it's a continuation of that. I will, uncharacteristically, seriously explain the three technical changes that occurred from v3.8 to v3.9.
- The story of eliminating tmux send-keys and unifying it under Hooks (v3.8)
- The story of a deadlock occurring the moment it was unified (v3.8 → v3.9)
- The story of re-investigating skills and finding 14 of them were fossils (v3.9)
This time, there's more code than usual. This is for those interested in multi-agent communication design.
Premise: Communication Issues in the Shogun System
As I've written many times in this series, the Shogun system runs 10 AI agents in parallel on tmux.
Shogun (Me)
└─ Karo ← Middle manager. Distributes tasks to 7 agents
├─ Gunshi ← Quality review 담당 (in charge of quality review)
└─ Ashigaru 1-7 ← Practical work force
The problem is "how to deliver messages between agents."
Initially, we used tmux send-keys to directly type text. Things like "sending an Enter key to Karo's pane to wake them up."
# Old method: Waking up Karo
tmux send-keys -t multiagent:0.0 "inbox3" Enter
This was hell.
Chapter 1: Why send-keys Died
tmux send-keys is a command that "sends the same keystrokes a human would type on a keyboard to a pane." Using this for inter-agent communication was like kicking down the conference room door and shouting into someone's ear.
Problem 1: Collisions during input
If send-keys sends text while an Ashigaru is writing code, it gets injected into the prompt they are currently typing.
Ashigaru typing: "Fix this file|" ← Cursor here
send-keys arrives: "inbox3"
Result: "Fix this fileinbox3|" ← Broken
This led to 48 revisions. Sending text and Enter separately (0.3-second gap), sending Escape twice to reset the cursor position, etc. These were all workarounds.
Problem 2: Killing during Thinking
If Claude Code receives keystrokes while
it is thinking, it tries to interrupt its thought process and handle the input. There was an instance where sending Escape with send-keys immediately aborted an ongoing task.
CLAUDE.md, line 287:
| D006 | `kill`, `killall`, `pkill`, `tmux kill-server` | Terminates other agents |
The rule "Do not kill agents" had to be written in CLAUDE.md because of send-keys.
Problem 3: Impossible to determine busy status
Before sending send-keys, I wanted to know if the recipient was busy or idle. However, even by looking at tmux pane output with capture-pane, it was impossible to distinguish whether Claude Code was thinking or idle.
# Old method for busy determination (looking at the end of capture-pane)
tmux capture-pane -t "$PANE" -p | tail -5
# → ">" is displayed, so it's idle? No, maybe the Thinking display just hasn't updated.
48 revisions. All of them stemmed from this.
Chapter 2: Stop Hook as a Savior
In v3.8, send-keys was finally almost completely abolished. In its place, Claude Code's Stop Hook was introduced.
How Stop Hook works
Claude Code has a "Hook System." This feature allows any shell script to be called before/after tool execution or when an agent stops.
// .claude/settings.json
{
"hooks": {
"Stop": [{
"type": "command",
"command": "bash /path/to/stop_hook_inbox.sh"
}]
}
}
The Stop Hook fires "the moment an agent finishes its task and enters an idle state." This means:
- If the agent is busy → The Hook does not fire → No interference
- If the agent becomes idle → The Hook fires → At that moment, it checks the inbox and continues if there are unread messages
# Core part of stop_hook_inbox.sh (simplified)
UNREAD_COUNT=$(grep -c 'read: false' "$INBOX")
if [ "$UNREAD_COUNT" -eq 0 ]; then
touch "/tmp/shogun_idle_${AGENT_ID}" # Set idle flag
exit 0 # Allow stopping
fi
# Unread messages exist → Block stopping, provide inbox content as feedback
echo '{"decision":"block","reason":"Unread messages in inbox. Process them."}'
Here's why this was revolutionary:
| Item | Old method (send-keys) | New method (Stop Hook) |
|---|---|---|
| Delivery timing | Determined by send-keys (recipient's state unknown) | Moment recipient becomes idle (certain) |
| Input collision | Occurred (48 revisions) | Does not occur (Hook is a separate path from input) |
| Busy determination | Estimated by capture-pane (inaccurate) | Determined reliably by flag file |
| Thinking interruption | Occurred | Does not occur (Stop Hook fires only when idle) |
Killing send-keys wasn't because "tmux functionality was bad." It was because "it was impossible to accurately know the agent's state from the outside." Stop Hook solved this fundamental problem by switching to a mechanism where "the agent itself reports its status."
Flag file method
Another core aspect is the flag file.
# The moment an agent enters idle (created by Stop Hook)
touch /tmp/shogun_idle_karo
# When inbox_watcher checks for busy status
if [ -f "/tmp/shogun_idle_karo" ]; then
# Idle → Okay to send nudge
else
# Busy → Stop Hook will deliver, so wait
fi
The method changed from parsing pane output with capture-pane and estimating "probably idle" to a binary determination where "the file exists only when idle." Simple. Reliable.
I released v3.8. All 63 tests passed. I thought it was perfect.
Chapter 3: Deadlock — Karo Died Permanently
The day after releasing v3.8. I woke up in the morning and launched the campaign (started all agents), but Karo wouldn't wake up.
$ tmux capture-pane -t multiagent:0.0 -p | tail -5
╭─────────────────────────────────────────╮
│ Welcome to Claude Code! │
│ │
│ /help for help │
╰─────────────────────────────────────────╯
Nothing happened, just the Claude Code welcome screen.
Looking at the inbox_watcher log.
[2026-02-28 11:15] 1 unread for karo but agent is busy (claude) — Stop hook will deliver
[2026-02-28 11:16] 1 unread for karo but agent is busy (claude) — Stop hook will deliver
[2026-02-28 11:17] 1 unread for karo but agent is busy (claude) — Stop hook will deliver
... (continues forever)
"The agent is busy, so the Stop Hook will deliver."
No, look at the screen. It's the welcome screen. It's not busy.
Cause: The blank zone during startup
The design of the flag file method is as follows:
1. Agent starts
2. Executes task (→ Deletes flag file)
3. Task completes (→ Stop Hook fires → Creates flag file → Checks inbox)
4. If there's a next task, go to 2; otherwise, idle
The problem is between steps 1 and 2.
Immediately after Claude Code starts, before it has received any tasks:
- The Stop Hook has never fired → The flag file does not exist
- inbox_watcher determines "no flag file = busy"
- Because it's busy, no nudge is sent
- No nudge means the agent does nothing
- Doing nothing means the Stop Hook doesn't fire either
- Nothing ever happens, forever
inbox_watcher: "It's busy, so let the Hook handle it."
Stop Hook: "I haven't been called, so I'll do nothing."
Agent: "I'm idle because nothing's coming..."
→ Everyone is waiting for someone else to move → Deadlock
Why it wasn't found in tests
In E2E tests, the mock CLI (mock_cli.sh) immediately displays a prompt upon startup. The idle flag was created within the show_prompt() function that displays the prompt.
# show_prompt() in mock_cli.sh
show_prompt() {
touch "${IDLE_FLAG_DIR}/shogun_idle_${AGENT_ID}" # ← This!
echo -n "> "
}
In other words, the mock was "idle the moment it started." The real Claude Code just displays a welcome screen and doesn't create a flag file when displaying the prompt. The bug was hidden because the mock was smarter than reality.
Fix: Solved in 2 lines
# inbox_watcher.sh startup (around line 50)
# Fix: CLI starts at welcome screen = idle. Create idle flag so watcher
# doesn't false-busy deadlock waiting for a stop_hook that never fires.
if [[ "$CLI_TYPE" == "claude" ]]; then
touch "${IDLE_FLAG_DIR:-/tmp}/shogun_idle_${AGENT_ID}"
fi
When inbox_watcher starts, it creates a flag file indicating that "the Claude CLI is just started = idle." Just two lines.
Additionally, there was a safety net bug. inbox_watcher has a safety mechanism that "forces a nudge if it remains busy for a certain period." However, the activation condition for this safety mechanism, FIRST_UNREAD_SEEN, was not initialized in the busy+claude code path.
# Before fix: FIRST_UNREAD_SEEN was passed without being set
if [[ "$busy_cli" == "claude" ]]; then
echo "agent is busy (claude) — Stop hook will deliver" >&2
# ← FIRST_UNREAD_SEEN was not set! The safety mechanism would never activate
fi
# After fix:
if [[ "$busy_cli" == "claude" ]]; then
if [ "${FIRST_UNREAD_SEEN:-0}" -eq 0 ]; then
FIRST_UNREAD_SEEN=$now # ← Start the safety mechanism timer
fi
echo "agent is busy (claude) — Stop hook will deliver" >&2
fi
Because the safety mechanism was broken, the fallback to "send /clear after 4 minutes of being busy" also didn't activate. It was broken in two ways.
Addition: E2E Regression Test
I also fixed the mock not reflecting reality. A new test, E2E-008-F, was added.
E2E-008-F: Claude at welcome screen with no idle flag — nudge delivered via initial flag
This test:
- Starts inbox_watcher without an idle flag
- Confirms that inbox_watcher creates the initial flag
- Confirms that tasks are delivered correctly
- Confirms that the "agent is busy" log does not appear
When tests are smarter than reality, they miss real bugs. I think this is a lesson that applies to automated testing in general.
Chapter 4: SKILL.md specification had evolved
After fixing the deadlock, I suddenly wondered.
The Shogun system has 16 skills (SKILL.md). Skills for writing articles, researching social media, managing deployments. I automate workflows using the official "skill" feature of Claude Code.
And I'd heard rumors that the Claude Code skill specification had changed quite a bit recently. So I looked into it.
SKILL.md specification as of February 2026
The fields available in the frontmatter (YAML part) had increased considerably.
---
name: skill-name # kebab-case, max 64 characters
description: | # The only information Claude uses to decide "when to activate"
Clearly state What + When. Include trigger words.
argument-hint: "[target]" # Completion hint for /skill-name [target]
disable-model-invocation: false # true = activate only via manual /name (for skills with side effects)
user-invocable: true # false = menu hidden (for background knowledge skills)
allowed-tools: Read, Grep, Bash # Restrict usable tools
model: sonnet # Specify model at runtime
context: fork # fork = isolated execution in a sub-agent
agent: general-purpose # Agent type for forked context
hooks: # Skill-specific hook definitions
PostToolUse:
- matcher: "Edit|Write"
hooks:
- type: command
command: "./scripts/lint.sh"
---
Particularly important new features:
1. context: fork — Sub-agent isolated execution
Skills can be executed in a separate sub-agent process. This doesn't contaminate the main conversation context. Suitable for heavy processing.
2. $ARGUMENTS / $0 — Argument substitution
Calling
/my-skill 結婚 kekkon
will substitute $0 in the skill body with 結婚 and $1 with kekkon. This allows for dynamic skills.
3. !command`` — Dynamic context
Execute a shell command before loading the skill and embed the result.
## Current Branch
!`git branch --show-current`
## Recent Commits
!`git log --oneline -5`
This allows for things like pre-fetching external API data and using it within the skill.
4. hooks — Skill-specific Hooks
Define hooks per skill. For example, automatically run a linter after editing a file.
Agent Skills Open Standard
One more thing. The SKILL.md specification now conforms to an open standard called agentskills.io. The same SKILL.md can run on AI tools other than Claude Code (Cursor, Codex CLI, etc.).
This is a big deal for multi-agent systems. In the Shogun system, ashigaru sometimes use CLIs other than Claude Code (like Codex CLI), and if they can run with the same skill definitions, switching will be easier.
Chapter 5: 14 Fossil Skills
So, I reviewed my own skills. 14 out of 16 were outdated.
$ ls ~/.claude/skills/shogun-*/SKILL.md | wc -l
16
What was wrong?
| Issue | Count | Example |
|---|---|---|
allowed-tools unspecified |
12/16 | All tools usable → unintended behavior |
argument-hint missing |
10/16 | No argument hint appears for /skill-name
|
| Ambiguous description | 8/16 | Describes "what to do" but not "when to use" |
context: fork unused |
6/16 | Heavy processing clogs main context |
| North Star (purpose) not specified | 5/16 | Unclear why this skill exists |
The description issue was the most severe.
Claude Code decides "whether or not to use this skill" based solely on the description. The main body text is not used for activation judgment. This means if the description is ambiguous, it won't be used when desired, and will activate when not desired.
Bad example:
description: "Write an article"
Good example:
description: |
Discover and design new content ideas from keywords. Inventory existing articles → Research demand with X Research →
Gap analysis → Output new article design document. North Star is maximizing KPI.
Activates with "new article", "article idea", "discover article", "what should I write".
What (what to do) + When (when to use) + Trigger words (how to call it). Only when these three are aligned does a skill function properly.
7-item checklist
I organized a checklist for writing skill descriptions.
| # | Check | NG | OK |
|---|---|---|---|
| 1 | What: What to do | "Document processing" | "Extract tables from PDF and convert to CSV" |
| 2 | When: When to use | (none) | "Used in data analysis workflow" |
| 3 | Trigger words | (none) | "Activates with 'article QC', 'validation'" |
| 4 | Specific action verbs | "Manage" | "Extract, convert, validate" |
| 5 | Length: 50-200 characters | 1 word | 2-3 sentences for overview + trigger |
| 6 | Differentiate from existing skills | Overlaps with others | Clearly state unique scope |
| 7 | Do not use square brackets | "Process [PDF]" | "Process PDF" |
Embedding North Star in skills
Another important discovery: Custom fields are ignored by Claude Code.
# This is ignored
---
north_star: "Maximize business KPI"
---
Even if you add your own fields to the frontmatter, Claude Code only reads the prescribed fields. If you want to write a North Star (the highest objective of this skill), it must be written in the Markdown body.
## North Star (Highest criterion for all decisions)
The North Star of this skill is to **maximize business KPI**.
Improvement priority is determined by "business impact":
- High impact improvements = very high impact
- Low impact improvements = low impact (not worth the effort)
It might seem trivial, but without a North Star, a skill returns "technically correct" output. With a North Star, it returns output that "leads to business goals." This difference is significant.
Operation Rewrite 14 Skills
Rewriting all of them myself would be too tough.
So, I had four Ashigaru rewrite them in parallel via Karo (Karo). Issued as cmd_231. I gave them the newly written skill creator (skill-creator/SKILL.md) as a specification and attached two previously written skills as examples. The 14 skills were divided among the four.
# cmd_231 — Rewrite 14 existing skills to latest spec
action: "rewrite_all_skills"
spec: "skills/skill-creator/SKILL.md"
exemplars:
- "~/.claude/skills/shogun-content-discover/SKILL.md"
- "~/.claude/skills/shogun-content-upgrade/SKILL.md"
ashigaru_split:
ashigaru1: [ad-manager, cloudflare, github-reviewer, imagegen]
ashigaru2: [jira, model-advisor, model-switch, ratelimit]
ashigaru5: [screenshot, content-writer, x-en-writer, x-research]
ashigaru6: [zenn-analyzer, zenn-writer]
Four Ashigaru rewrite skills simultaneously. The rewritten skills are reviewed by Gunshi (Gunshi), and if OK, I (the Lord) give final approval.
Summary: What changed in v3.8 ~ v3.9
| Version | Change | Effect |
|---|---|---|
| v3.8 | send-keys → Stop Hook + Flag File | Abolished all 48 workarounds |
| v3.8→v3.9 | Startup deadlock fix (2 lines) | Karo no longer dies permanently |
| v3.9 | Skill creator completely rewritten | Compliant with latest specification |
| v3.9 | Issued rewrite for 14 skills | All skills with North Star + fork + allowed-tools |
The complete abolition of send-keys is the biggest change. This finally ended the unproductive battle of "inferring agent state from the outside." The agent itself reports, "I am idle." It's a natural thing, but it took 48 revisions and one deadlock to get there.
Bonus: SKILL.md Anti-Pattern Collection
Here's a list of "what not to do" found during the audit of 14 skills. I'm leaving it here in case someone else makes the same mistakes.
| NG | Reason | Instead |
|---|---|---|
| SKILL.md over 1000 lines | Huge loading cost increase | Separate into reference.md (under 500 lines) |
| description is one word | Does not activate or misfires | What + When + Trigger words |
context: fork + guidelines only |
Sub-agent wanders with unknown task | Write clear tasks for fork |
disable-model-invocation + user-invocable: false
|
No one can activate it | Only one or the other |
| Custom fields in frontmatter | Ignored by Claude Code | Write in Markdown body |
Heavy processing without allowed-tools
|
Unintended tool usage | List only necessary tools |
If you want to try it out, here it is:
v3.9 has significantly improved stability. The Hook and flag file code is in scripts/stop_hook_inbox.sh and scripts/inbox_watcher.sh. The example skill specification is skills/skill-creator/SKILL.md. All are open-sourced.
Next time, I'll probably report on the results of rewriting the 14 skills. It's the first time four Ashigaru are rewriting skills simultaneously, so I honestly don't know what will happen. Something might die again.
Discussion