iTranslated by AI
River Review v0.30–v0.33: A Half-Month of Improvement Loops and Refining applyTo Scoping
This article is a reconstruction of the work logs for River Review v0.30.0 to v0.33.0 (as of 2026-05-08).
TL;DR
- Over the half-month from v0.30.0 to v0.33.0, River Review was redefined from an "agent that only executes reviews" to an "Improvement Loop OS" (an operational foundation for "Review → Verify → Institutionalize" that verifies review results and feeds them back into mechanisms like test cases, suppression rules, and references).
- Through Epic #743 (P1+P2), the responsibilities of the entry skill
river-review(input classification, specialist skill selection, verification, feedback classification, and loop handover) were clearly defined. - New
applyToscoping rules (docs/development/skill-applyto-scoping.md) were established, and overly broad globs that caused planner false-positive routing were reorganized across 13 skills in 2 batches. - The planner-dataset eval (evaluation set for routing,
coverage=1.0 / top1Match=1.0/ 23 cases) remained green throughout the entire period, with updates focused solely on increasing the ability to detect routing regressions.
What Changed (high-level)
Period: 2026-04-30 (v0.29.0) to 2026-05-06 (v0.33.0), a total of 4 releases.
| Release | Key Changes |
|---|---|
| v0.30.0 | Added rr-upstream-context-budget-tuning-001 skill (#736) |
| v0.31.0 |
Epic #743 P1: Redefined river-review as an improvement-loop orchestrator + 3 references (VERIFICATION / FEEDBACK / IMPROVEMENT_LOOP) (#744 #745) |
| v0.32.0 | Epic #743 P2: routing/planner eval cases (#746) + feedback-to-fixture conversion workflow (#747) + suppression-feedback fixtures (#739) + eval-driven-skill-design skill (#737) |
| v0.33.0 |
applyTo scoping rules + cleanup of 13 skills (Epic #762 / Implementation PRs #766 #767) |
In parallel, 4 dependabot PRs, Docusaurus version alignment, translations (#733 #734), and the addition of the code_search dependency (#738 PR-2) were also completed.
Epic #743: improvement-loop orchestrator
Before / After
Before: skills/agent-skills/river-review/SKILL.md was a "router that distributed tasks to specialist skills based on keywords." Misdetections and oversights were fixed via ad-hoc prompt corrections, leaving no trace in the repository.
After: The same SKILL.md expands the responsibilities of the entry skill into the following 6 items, each of which is explored in depth in a reference:
- Classify input intent — Determines the target category based on user intent / phase / artifact / risk
- Select specialist skills — Routing table and priority rules
- Create review execution plan — Collects artifacts according to input priority
-
Verify findings — 6 self-check items in
references/VERIFICATION.md -
Classify feedback — 7-type taxonomy in
references/FEEDBACK.md -
Hand off learnings — 9-step loop in
references/IMPROVEMENT_LOOP.md
The Role of the 3 References
| File | Role |
|---|---|
VERIFICATION.md |
Self-checks before outputting findings (6 items, such as whether evidence is linked to diffs, if impact is concrete, and if severity and confidence are calibrated) and rejection conditions |
FEEDBACK.md |
Classifies feedback into 7 types: accepted / false_positive / missed_issue / not_actionable / duplicate / accepted_risk / unclear. Provides a one-to-one mapping for where each should be funneled (fixture / suppression / reference / routing) |
IMPROVEMENT_LOOP.md |
9-step loop: Route → Review → Verify → Classify → Patch One Thing → Add Fixture → Run Eval → Record Learning → Promote Rule |
In addition, FEEDBACK_TO_FIXTURE.md was added in P2 as a supplement, summarizing the "primary destination / secondary destination / required eval command / whether rationale is required" in a single table for each feedback type. Procedures for triaging missed_issue into 3 root causes—routing miss / missing context / weak instructions—have also been clearly documented.
Why This Is Beneficial
- Doesn't end with prompt fixes — Every piece of feedback is guaranteed to result in an update to a fixture / reference / suppression / or routing configuration
-
Conveys the meaning of HIGH_SEVERITY guards — Consistent behavior across docs and eval prompts, where
major/criticalfindings will reappear via the guard even if suppressed withoutaccepted_riskclassification -
Enables regression detection of routing via planner-dataset — Added 3 cases in #746: "architecture intent," "pre-mortem intent," and "multi-skill (security + observability)," enabling protection via
coverageandtop1Match
eval-driven improvement loop
Where to Use npm run eval:fixtures / npm run planner:eval:dataset
Excerpt from the conversion table in FEEDBACK_TO_FIXTURE.md.
| feedback type | Primary Destination | Required Eval Command |
|---|---|---|
accepted |
(None) | npm run eval:fixtures |
false_positive |
Guard fixture (<NN>-guard.md / *-should-not-detect) |
npm run eval:fixtures + npm run eval:repo-context
|
missed_issue |
Happy-path fixture (<NN>-happy.md / *-should-detect) |
npm run eval:fixtures + npm run eval:repo-context + npm run planner:eval:dataset
|
not_actionable |
Fix template in reference / add example | npm run skills:validate |
duplicate |
Update routing (clarify owner skill) or logic within skill |
npm run planner:eval:dataset + npm run skills:validate
|
accepted_risk |
Suppression entry (rationale required) | npm run skills:validate |
unclear |
Improve wording in skill SKILL.md / reference | npm run skills:validate |
We standardized the operation of verifying that eval exits with code 0 locally before pushing. Relying on more than just CI passes is proving effective.
New Skill rr-upstream-eval-driven-skill-design-001 (#737)
An upstream skill that checks if fixtures/ happy-path × guard pairs and eval/ wiring (e.g., promptfoo.yaml or cases.json) are present when a new skills/**/SKILL.md is included in a PR. If present, it remains silent; if missing, it issues a minor finding and guides the user to wire up npm run eval:fixtures / npm run eval:repo-context. Since the Pre-execution Gate reacts only to the addition of a new SKILL.md, it does not trigger for PRs editing existing skills.
applyTo scoping rules (#762)
What Was the Problem?
- Skills with bare extension globs such as
applyTo: ['**/*.ts', '**/*.tsx']were firing on test files undertests/or*.config.ts—files that were not originally within the skill's scope. - This manifested as planner false-positive routing: tokens were consumed by the prompt, but the output often became noise due to domain mismatch.
New Rules (docs/development/skill-applyto-scoping.md)
Defines 3 cases where applyTo is considered "over-broad":
- The pattern is unconstrained (
'**/*') and the skill is not meta / process / sample. - The pattern is bound only by extension, but the skill's review domain is stream-specific (upstream, midstream, or downstream).
- Matches files outside the skill's domain for a typical project layout (e.g., a midstream code-quality skill matching
tests/**).
Recommended applyTo by phase:
-
upstream:
docs/architecture/**/*.md,docs/**/*architecture*.md,docs/**/*design*.md,**/*.adr, etc. -
midstream:
src/**/*.{ts,tsx},app/**/*.{ts,tsx},lib/**/*.{ts,tsx},packages/**/*.{ts,tsx}(per extension) -
downstream:
tests/**/*.{ts,tsx,js,jsx},__tests__/**/*.{ts,tsx,js,jsx},**/*.test.{ts,tsx,js,jsx},**/*.spec.{ts,tsx,js,jsx}
Results (13 skills)
Implementation for Epic #762 was split into two PRs.
-
Batch 1 (#766, 8 midstream skills) — Replaced
**/*.ts/**/*.tsxwith dir-bounded patterns forsrc|app|lib|packages -
Batch 2 (#767, 5 upstream skills) — Replaced
**/*.md/**/*.{yaml,yml,json}with dir-bounded patterns fordocs|pages|specs|design|architecture
Planner-dataset eval maintained 23 cases / coverage=1.0 / top1Match=1.0 throughout the entire period.
The initial audit estimate of "50 over-broad" skills diverged from the actual count of 13. The main reason discovered during measurement was that skills falling under excludedTags (sample / hello / policy / process / routing) were already being excluded by the planner and had no impact.
Learnings
- Institutionalize vs. Prompt — When the discipline of converting every piece of feedback into a fixture / suppression / reference / or routing update is maintained, prompt corrections do not recur.
-
Verify planning numbers with actual measurements — The audit's "50 over-broad" estimate was actually 13 when
excludedTagswere considered. It is faster to measure on the implementation side before committing to numbers in a plan. -
Use planner-eval as a guard — Setting
coverage=1.0 / top1Match=1.0as a merge gate allows for mechanically stopping routing regressions even when scoping is narrowed. - Release-please should exclude chore commits from version bumps — It is expected that the version does not move with consecutive merges of docs/chore PRs. Since we had several feature PRs this period, it landed on 4 releases.
Related Links
- river-review — Repository
- Epic #743 — Skill Improvement Loop
-
docs/development/skill-applyto-scoping.md— applyTo scoping rules -
skills/agent-skills/river-review/references/IMPROVEMENT_LOOP.md— 9-step loop -
skills/agent-skills/river-review/references/FEEDBACK_TO_FIXTURE.md— Feedback type → destination mapping table
Discussion