iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
😇

Addressing the 'Lost-in-the-Middle' Phenomenon in ClaudeCode Workflows

に公開

Record of Countermeasures for Lost in the Middle and ClaudeCode Workflows

⚠️ Note
The "Lost in the Middle" phenomenon discussed in this article is observed in LLMs, including Claude. However, its universality and severity vary depending on the situation.
"Long context length does not necessarily mean Lost in the Middle will occur."

These countermeasures are merely "temporary responses to Lost in the Middle occurring in personal development" and do not provide a fundamental solution for LLMs in general.
True solutions require prompt engineering design philosophy and task splitting/reference control mechanisms.


Introduction

CLAUDE.md Specifications

When introducing Claude into a development workflow, using CLAUDE.md to have rules loaded at the beginning offers the following benefits:

  • ✅ Automatically reference project-specific rules and workflows
  • ✅ Explicitly state granular policies and approval flows in advance
  • ✅ Encourage "consistent judgment" immediately after startup

📚 Reference: Claude.md Specifications and Memory Management

Furthermore, by linking detailed documentation from CLAUDE.md, it is possible to load granular rules as well.

The Problem of Bloated Context

However, if you include too much, don't you encounter behavior where it forgets the settings or premises you instructed?
This is caused by a problem unique to LLMs known as Lost in the Middle.

In small-scale projects or early stages, the volume of information is low and "the middle section is thin," so the impact of Lost in the Middle is not very visible.
However, as CLAUDE.md and linked documentation grow, important information becomes buried in the middle, easily ignored, and leads to malfunctions.


Chapter 1: The Reality of the Problem

In my personal development environment, the following "runaway behaviors" were repeatedly observed:

  • Unexpected Execution: Executing scripts that were relegated to deprecated
  • Forgotten PR Request: Ignoring the request for a Pull Request and only pushing
  • Misunderstanding Instructions: Arbitrarily changing the specified "Strategy B" to a different approach
  • Discrepancy in Completion Reports: Reporting "100% complete" when it is actually incomplete
  • Unauthorized Execution: Only permitted to create Google Sheets entries → performed cleanup on its own
  • Repetitive Forbidden Actions: Repeating the forbidden git add -A

👉 The common essence is that "Claude's judgment of efficiency takes precedence over user instructions."


Chapter 2: Quantitative Evidence

Semantic Overlap Analysis

To confirm the cause of the problem, I investigated the semantic overlap of CLAUDE.md and the linked documentation.

  • Semantic Overlap = "The degree to which the same meaning or concept is redundantly described in multiple files."
  • The higher the degree of overlap, the harder it is for the LLM to identify the correct source of information, and inference accuracy decreases.
Category Number of Files Overlap Impact
Approval Related 21 43% Direct cause of policy violations
Version Control 33 68% Induces indecision
Workflow Related 48 99% Overlap in almost all files
  • Total 97 files / Estimated over 100,000 tokens
  • For many models, accuracy tends to decrease at this scale, and important information tends to be overlooked.

📸 Figure: Overlap Matrix (Example)

📸 Figure: Overlap Distribution Chart (Example)

For example, the types of overlaps are as follows:

Overlap Sample Example: "Approval"

【CLAUDE.md】
Stop without fail during Phase transitions and before destructive operations, explicitly state "Please approve," and wait for the user's response

【docs/workflows/checklists/tracker_workflow_checklist.md】
SOW Approval Acquisition: SOW content approval is mandatory before starting work
Approval 1: SOW/Detailed Plan Approval

【docs/workflows/enhanced_approval_workflow.md】
Approval 1: Plan Approval
Approval 2: Implementation Policy Approval
Approval 3: Test Result Approval
Approval 1: Plan Approval (Duplicate Description)
Approval 2: Implementation Policy Approval (Duplicate Description)
Approval 3: Test Result Approval (Duplicate Description)

Reasons for decreased inference accuracy:

  • The same concept of "Approval" is described with different expressions in multiple places
  • Claude cannot determine "which approval rule takes precedence"
  • It is indistinguishable whether it is approval during Phase transition, SOW approval, or plan approval

🔄 Version Control Overlap (68% Overlap, 33 Files)

Overlap Sample Example: "Version"

【CLAUDE.md】
✅ Update only minor versions: v0.9.1 → v0.9.2 → v0.9.3
❌ Prohibition of middle version updates: v0.9.x → v0.10.0

【README.md】
Update only minor versions (v0.9.1 → v0.9.2)

【CHANGELOG.md】
v0.9.35, v0.9.34, v0.9.32 (Actual version history)

Reasons for decreased inference accuracy:

  • Identical versioning rules are scattered across multiple files
  • Claude is confused about which information source to trust
  • Confusion occurs due to the mix of actual CHANGELOG records and CLAUDE.md rules

Overlap Sample Example: "Workflow"

【docs/workflows/checklists/tracker_workflow_checklist.md】
Improved 13-step, 4-phase workflow
Phase 0.5: Branch validation phase (Required/Independent execution)
Phase 1: Planning/Preparation phase (Steps 0-4)
13-step, 4-phase workflow (duplicate description at the end)

【docs/workflows/enhanced_approval_workflow.md】
Phase 1: Filing/Planning phase
Phase 2: Implementation/Testing phase
Phase 3: CI/Quality workflow phase

【docs_backup_20250903/workflows/enhanced_approval_workflow.md】
(The exact same content exists in the backup)

Reasons for decreased inference accuracy:

  • Almost all workflow files describe "13 steps" and "4 phases"
  • The same content is duplicated in the backup folder (docs_backup_20250903/)
  • Phase numbering is inconsistent (Phase 0.5, Phase 1, Phase 2...)

Chapter 3: Solution Plan

Based on the analysis so far, ideally, several countermeasures are needed, such as "integrating extraction commands," "mandatory PRs," and "introducing execution permission levels." However, the response this time (PR #86, PR #89) was limited to a subset.

Primary Response (PR #86, #89)

  • Refreshing and integrating CLAUDE.md and related documentation
  • Explicitly indicating "what to do next" in conjunction with UI/UX (to reduce procedure skipping and incorrect reporting)

Fundamental Response (Future Tasks)

  • RAG: Retrieve necessary information on demand
  • Few-shot: Present good examples to encourage learning
  • Summarization: Compress and organize important information
  • Context Engineering: Introduce information design philosophy

Furthermore, the research community is exploring the following approaches:

Technology Name Overview Merits/Challenges
Infini-attention Combines compressed memory + local attention + linear attention (arXiv) Retains long-range dependencies while improving computational efficiency. However, there is a risk of information loss during compression.
Sparse / Graph-based Attention Limits attention to local areas or graph structures instead of all tokens (arXiv) Reduces computational volume. However, there is a possibility of losing long-range dependencies.
Squeezed Attention Skips attention by clustering inputs (ACL Anthology) Effective for specific use cases. Unsuitable for general purposes.
MInference Accelerates pre-filling with dynamic sparse attention (OpenReview) Improves speed for large inputs. However, there is a risk of decreased accuracy.
SampleAttention Dynamically selects sparse patterns during execution (ResearchGate) Reduces processing while maintaining accuracy. However, additional computational costs arise.

Conclusion and Notes

  • Lost-in-the-Middle is not a universal phenomenon
    → Context length does not necessarily cause issues; the impact varies depending on the model and design.

  • PR #86 / #89 this time are emergency measures
    → Accuracy can be improved, but fundamental solutions require information design and prompt engineering.

Things to keep in mind:

  1. It cannot be concluded that "context length = cause of Lost-in-the-Middle."
  2. This response is a temporary strengthening measure.

Discussion