iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧠

Full Design of Pseudo-Metacognitive Architecture: Building the Brain of the "Non-Answering RAG" [RAG Development Diary Vol. 5]

に公開

Overall Design of Pseudo-Metacognitive Architecture

── Building the Brain for the "RAG that Doesn't Answer"

Introduction

In Vol.1 to 4, I described the journey of the "RAG that doesn't provide answers ── Socrates" from its runaway state to being brought under control.

From here on, I have decided to record a technical implementation guide for you to build your own "Socratic RAG."

First, regarding the overall design: the LLM is not intended to be used as a standalone unit.


1. Limits of Standalone LLMs ── What Happened in Vol.3

Looking back at the "runaway" experienced in Vol.3, standalone LLMs have four structural limitations.

Limit of Standalone LLM Phenomenon in Socrates
Uncontrollable Scope Faithfully answers even questions about cooking
Non-deterministic Output Quality Returns a 148-character explanation even when instructed "within 100 characters"
Absence of Self-Monitoring Provides long explanations even when told "don't answer"
Inability to Detect Deadlocks Doesn't notice itself falling into a question loop

Even if you write "don't answer" or "be brief" in the prompt, the LLM gets dragged by the context and starts a long-winded explanation.

Unless you enforce it as "must follow" in Python code, it often won't listen to a mere "request" in the prompt.


2. Design Philosophy: Sandwich Structure

2.1 The Big Picture

Place the LLM in the center as the "thinking brain," and sandwich its input side and output side with a Python control layer.

Overall Diagram
        ┌────────────────────────────────────┐
        │      Python Control Layer           │
        │                                     │
        │  ┌────────────┐  ┌───────────────┐ │
        │  │ Pre-hooks   │  │  Post-hooks   │ │
        │  │ (Input Cens)│  │ (Output Cens) │ │
        │  └─────┬──────┘  └───────┬───────┘ │
        │        │                  │         │
        │        ▼                  ▲         │
        │  ┌────────────────────────────┐    │
        │  │    LLM (Claude 3 Haiku)    │    │
        │  │      "Thinking Brain"      │    │
        │  └────────────────────────────┘    │
        └────────────────────────────────────┘

2.2 Why "Prompt Only" is Insufficient

Approach Pros Limits
Prompt Only Easy, immediate implementation LLM "forgets" or "misinterprets" instructions
Sandwich Deterministic control Higher implementation cost

The runaway in Vol.3 occurred because I tried to control it with "prompts only."
That's why I introduced the Python control layer (HOOK mechanism) in Vol.4.


3. Tech Stack and Project Structure

3.1 Technologies Used

Layer Tech Selection Reason
Frontend Streamlit Can be completed with Python only, making it the fastest way to turn a prototype into a "working app."
Embedding paraphrase-multilingual-MiniLM-L12-v2 384 dimensions, multi-language support. Sufficient accuracy for Scope determination in Japanese.
LLM Claude 3 Haiku Blazing fast and low cost. Speed is life for a design that assumes retries.
Storage Google Firestore Unified management of DB and Vector search. A dedicated Vector DB is over-engineering at this scale.
Similarity Calculation scikit-learn (cosine_similarity) For Scope Guard determination. High affinity with numpy arrays.

3.2 Logical Structure of the Project

The current implementation is a single file (streamlit_app.py), but logically it is divided into the following four responsibilities. The individual explanations from Vol. 6 onwards will follow this structure.

Structure
socrates_rag/

├── app.py                  # 🖥️ Streamlit Frontend
│                           #    - Chat UI / Sidebar settings / Visualization
│                           #    - Main loop (Flow of Input → HOOK → Output)

├── hooks.py                # 🛡️ HOOK Mechanism (Control Layer)
│                           #    - pre_hook_scope_guard()
│                           #    - check_deadlock_breaker()
│                           #    - socratic_validation()
│                           #    - post_hook_exit_trigger()

├── prompts.py              # 📝 Prompt Definitions
│                           #    - System prompts for Teacher / Coaching
│                           #    - Level-specific constraints (prohibitions)
│                           #    - TOPIC_ANCHORS (Scope definitions)

├── vector_search.py        # 🔍 Search & Embedding
│                           #    - Firestore connection/query
│                           #    - Vectorization / Similarity calculation
│                           #    - search_documents()

└── data/
    └── docs/               # 📄 Markdown (Course of Study)
                            #    - Structured data with Frontmatter
                            #    → Detailed explanation in Vol. 6

4. System Architecture

4.1 Component Diagram

Component Diagram

4.2 Processing Flow (4 Phases)

Flowchart

5. HOOK Mechanism: 5 Layers of Control

This is the core of this article.

The five HOOKs are defense layers against different "failure modes." They aren't just filters; I arrived at this form through continuous debugging.

5.1 🛡️ Scope Guard ── Vectors Cannot Understand "Hello"

Purpose: To turn away questions outside the learning scope at the door.

User input is vectorized and evaluated using cosine similarity against "Anchor texts" defined for each category.

TOPIC_ANCHORS = {
    "Tech_Singularity": "Computer science, artificial intelligence, semiconductor engineering...",
    "Strategy_Mgmt":    "Corporate management, business strategy, organizational management...",
    "Life_Scaling":     "Personal health management, mental health..."
}
# Threshold less than 0.15 → Reject as "Out of scope"

Is the latest vector search perfect?

No, reality isn't that sweet.

If you just type "Hello," the Scope Guard will ruthlessly reject it saying "Out of scope." This is because the concept of "greeting" does not exist in the technical document space.

It is always gritty rule-based (if statements) that compensate for these limitations. I really wanted to avoid this, but...

# Bypass circuit: Short text + Conversation keywords → Skip vector check
safe_keywords = ["don't know", "tell me", "hint", "thanks", "hello", ...]

if len(query) < 30 and any(kw in query for kw in safe_keywords):
    return True, ""  # Pass without vector check

"Pass if it's short and contains keywords." This tiny two-line snippet of code drastically saves the user experience.

How was the threshold 0.15 decided?

It's not a magic number. It is the result of gritty fine-tuning.

At first, it was too strict, and it rejected even slightly unique phrasing. So, I settled on 0.15 through a process of "starting loose (0.1) and tightening it up whenever false positives (passing topics that shouldn't pass) occurred."

If you are introducing this to your own project, I recommend starting at 0.1 and adjusting while observing actual user input. Including a mechanism to display the Debug value (similarity score) upon rejection makes adjustment significantly easier.

5.2 ⚠️ Deadlock Breaker ── An Escape from the Infinite Question Loop

Purpose: To automatically provide hints when Socratic questioning reaches a deadlock.

Socrates' job is to "ask questions." However, continuing to ask when a user is truly stuck is harassment.

# Trigger condition: Previous AI response was a question + any of the following:
# 1. Give-up declaration: "don't know", "give up", "help"
# 2. Meaningless input: "...", "???" (Regex check)
# 3. Repetition: Exactly the same answer as the previous one

When triggered, an emergency instruction is injected into the system prompt.

[Emergency System Instruction]
The conversation is in a deadlock. Stop repeating questions and
present a "core hint for the answer."

It temporarily deactivates "Coaching Mode" to help the user. This switching between strictness and kindness is realized through code.

5.3 🔍 Socratic Validation ── Censoring Output "Character Count" and "Endings"

Purpose: To censor whether the LLM is "teaching too much," only during Coaching Mode.

The strictness of the censorship changes depending on the "Socratic Level" selected via the sidebar slider.

Socratic Level Character Limit UI Display
L1: Companion 160 chars 🟢 Hints included
L2: Silence 120 chars 🟡 Metaphors/Questions
L3: Iron Mask 80 chars 🔴 Questions only

Censorship rules are simple, with only two.

def socratic_validation(response_text, level=1):
    limits = {1: 160, 2: 120, 3: 80}
    max_len = limits.get(level, 160)

    # Rule 1: Character count
    if len(response_text) > max_len:
        return False, f"Too long ({len(response_text)} chars). Keep it within {max_len}."

    # Rule 2: Does it end with a question?
    if "?" not in response_text[-10:] and "?" not in response_text[-10:]:
        return False, "Always end with a 'question' to the user."

    return True, None

Why does it function with just these two?

"Character count" is a physical upper limit of information volume. There is a strong correlation: long answer = explaining.
"Question form" is a guarantee of the dialogue's direction. If it doesn't end with a question, it's the same as "teaching."

No complex NLP analysis is required. With just len() and string slicing, you can contain the LLM's "urge to teach."

Before / After ── The Moment the Execution Layer Intervenes

Let's look at what is actually happening here.

🔴 Before: Draft answer before correction (LLM "explained" it)

AI (Draft):
The reason RAG is useful for project management is mainly due to "information searchability"
and "context integration." Traditionally, project specifications and past meeting minutes
tend to be scattered, but with RAG, you can search across these and generate answers
tailored to current issues. This enables managers to speed up decision-making.

This answer is "correct." However, as Socrates, it is a failure. It has provided the answer.

⚙️ Intervention: Execution Layer Logs

[Socratic Validation] ⚠️ REJECTED
  Reason  : Length 148 chars > Level 3 limit (80 chars)
  Action  : Triggering One-shot Retry...
  Constraint: [Absolute Order] Explanations, hints, and empathy are strictly prohibited.
              Return only one short counter-question that strikes at their assumptions.

Validation detects the 148-character "explanation" and executes regeneration exactly once with the Level 3 prohibition order.

🟢 After: Final corrected answer (Level 3: Iron Mask)

AI (Final):
In project management, what information do you feel is
the most "wasteful to search for"?

The 148-character explanation changed into a "short question" with just one retry.

5.4 🔄 One-shot Retry ── Decisions Must Be Made in One Go

Purpose: When Socratic Validation fails, add level-specific "prohibitive instructions" and regenerate only once. Users cannot wait until it passes. Socrates must get it right in one shot. All constraints are packed into that single retry.

level_constraints = {
    1: "Ask a question that provides a hint and guides the user.",
    2: "Refrain from explanations and throw a question that uses metaphors or doubts the user's common sense.",
    3: "No answers, explanations, or empathy required. Just one short counter-question that strikes at their assumptions."
}

Since we are forcing a retry, speed and cost are paramount. That's why we adopted the blazing-fast Claude 3 Haiku instead of Gemini Pro or Claude Opus. During retries, we narrow down to max_tokens=300 and fetch the response all at once using a synchronous call (messages.create) rather than streaming.

5.5 🎉 Exit Trigger ── Celebrating the Moment of the Correct Answer

Purpose: Detect when the user reaches the correct answer and perform a success effect.

success_keywords = [
    "Correct", "That's right", "You passed",
    "Good understanding", "Correct understanding", "Hitting the nail on the head", ...
]
# To prevent false positives: "Incorrect" and "Not right" are excluded

The moment the strict Socrates acknowledges the user, st.balloons() fly. This small effect multiplies the user's sense of achievement.

Ingenuity for preventing false positives

Since "That's a great question" is not a success, "great" is excluded from the keyword list. Negative forms like "That's not correct" or "You're not right" are also filtered out in a later check.


6. Prompt Construction: Two Personas and Contextual Inertia

6.1 Teacher Mode ── The Explaining Persona

You are an "explanation-loving teacher."
Based on the reference data, explain logically and clearly.
Add simplified explanations for technical terms to support learning.

Socratic Validation does not trigger. The LLM's output is displayed as is.

6.2 Coaching Mode ── The Questioning Persona

You are a "strict Socratic coach."
[Absolute Dialogue Rules]
1. No explanations: Do not state definitions or explanations.
2. Counter-questions: Reply with a question.
3. Short answers: Within 100 characters.
4. Hint limits: Only one hint when the user "gives up."

By embedding good and bad examples (Few-shot) in the prompt, we suppress the LLM's "urge to explain."

[Bad Example]
User: What is L0?
AI:   L0 refers to the physical layer. Specifically...(Long explanation)

[Good Example]
User: What is L0?
AI:   What do you think is the "foundation" that supports the Singularity?

6.3 Contextual Inertia ── An AI's Persona Resides in the "Immediate History"

Even after switching the mode from "Teacher" to "Coaching," the AI might still continue to speak in an explanatory tone.

AI reads the room. If the past logs are in "explanation mode," it reads that atmosphere (context) and tries to continue explaining. This is contextual inertia.

To break this, we separate the UI-side presentation from the API-side transmission logic.

# UI side: Show the "change" to the user (Add notification to session_state)
st.session_state.messages.append({
    "role": "system",
    "content": "[System Notification] Mode has been changed to 'Coaching'.\n                Discard previous behaviors and stick to your new role."
})

# API side: Exclude system notifications and redefine roles in the System Prompt
messages_for_api = [m for m in messages if m["role"] != "system"]

Show the "change" to the user, and forcibly inject the "current role" into the AI via the system prompt.

This dual management is exactly what creates a sharp change in character.

Rather than making it read ambiguous logs, redefining the "current role" in the System Prompt (the highest-level command) is more effective for character changes.


7. Integrated Structure of the System Prompt

The system prompt ultimately sent to the LLM has a four-layer structure.

┌─────────────────────────────────────┐
│ Layer 1: Style Instruction          │
│ Definition of Teacher or Coaching   │
├─────────────────────────────────────┤
│ Layer 2: Past Context               │
│ Summary of compressed past dialogue │
├─────────────────────────────────────┤
│ Layer 3: RAG Context                │
│ Reference data from Firestore       │
│ (top_k=5, full text concatenated)    │
├─────────────────────────────────────┤
│ Layer 4: Emergency Override         │
│ Emergency instructions only injected │
│ during a Deadlock                   │
└─────────────────────────────────────┘

Against this four-layer prompt, the latest 10 entries of conversation history (excluding system roles) are sent as messages. To keep token costs down, older history exceeding 10 entries is compressed into Past Context by having Claude Haiku summarize it.


8. Summary ── Three Lessons

If you rely only on an LLM, it remains just a "thinking brain" and lacks a "mechanism for self-discipline."

That's why I implemented a five-layer HOOK mechanism in Python to create a Pseudo-Metacognitive Architecture with a "sandwich structure" that grips both the input and output.

HOOK Position What it protects
Scope Guard Pre-input Boundaries of the learning scope
Deadlock Breaker Post-input User's psychological safety
Socratic Validation Post-output The promise not to "teach"
One-shot Retry Post-output Response quality (deciding in one go)
Exit Trigger Post-output Experience at the moment of achievement

Overall design essence is summarized in these three lessons.


🛠️ Appendix: Ready-to-use HOOK Function Set

Below is a minimal set of HOOK functions that you can copy and incorporate into your own project.

📋 HOOK Function Set (Python) ── Minimal setup to copy and try
"""
socrates_hooks.py
── Minimal implementation set for Socratic RAG HOOK mechanism
"""
import re
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity


# =============================================
# 🛡️ 1. Scope Guard (Input Censorship)
# =============================================

# "Concept Definitions (Anchors)" per category
# → Rewrite these according to your domain
TOPIC_ANCHORS = {
    "Tech": "Computer science, artificial intelligence, semiconductor engineering, programming...",
    "Business": "Corporate management, business strategy, organizational management...",
}

# Bypass keywords for conversation
SAFE_KEYWORDS = [
    "don't know", "understand", "tell me", "hint", "correct", "answer",
    "thanks", "hello", "continue", "yes", "no"
]


def scope_guard(query, query_vec, anchor_vec, threshold=0.15):
    """
    Determines if input is within scope.
    
    Args:
        query: User input text (for bypass check)
        query_vec: Embedding vector of the query (shape: [1, dim])
        anchor_vec: Embedding vector of the Anchor text (shape: [1, dim])
        threshold: Similarity threshold (default 0.15, 0.1 recommended as starting point)
    
    Returns:
        (is_valid: bool, similarity: float, message: str)
    """
    # Bypass: Short text + conversation keywords → skip check
    if len(query) < 30 and any(kw in query for kw in SAFE_KEYWORDS):
        return True, 1.0, ""
    
    # Vector similarity check
    similarity = cosine_similarity(query_vec, anchor_vec)[0][0]
    
    if similarity < threshold:
        return False, similarity, f"🚫 Out of scope (Similarity: {similarity:.3f})"
    
    return True, similarity, ""


# =============================================
# ⚠️ 2. Deadlock Breaker (Deadlock Detection)
# =============================================

GIVE_UP_KEYWORDS = [
    "don't know", "no idea", "impossible", "can't do", "tell me", "answer",
    "give up", "pass", "hint", "help"
]


def check_deadlock(history, current_input):
    """
    Detects if the conversation is in a deadlock state.
    
    Args:
        history: List of chat history [{"role": "user"|"assistant", "content": "..."}]
        current_input: Current user input
    
    Returns:
        is_deadlock: bool
    """
    try:
        if not history or len(history) < 2:
            return False
        
        last_msg = history[-1]
        if last_msg.get("role") != "assistant":
            return False
        
        # Condition 1: Previous AI message ended with a question
        last_ai = last_msg.get("content", "").strip()
        if not last_ai.endswith(("?", "?")):
            return False
        
        # Condition 2: User gave up, input is meaningless, or is a repetition
        is_give_up = any(kw in current_input.lower() for kw in GIVE_UP_KEYWORDS)
        is_meaningless = bool(re.match(r"^[\s\.。、\??!!]+$", current_input))
        is_repeating = (
            len(history) >= 3
            and current_input.strip() == history[-2].get("content", "").strip()
        )
        
        return is_give_up or is_meaningless or is_repeating
    
    except Exception:
        return False  # On error, pass safely with no detection


# =============================================
# 🔍 3. Socratic Validation (Output Censorship)
# =============================================

def socratic_validation(response_text, level=1):
    """
    Censors whether LLM output is "Socratic."
    
    Args:
        response_text: LLM response text
        level: Socratic level (1=Companion, 2=Silence, 3=Iron Mask)
    
    Returns:
        (is_valid: bool, feedback: str | None)
    """
    limits = {1: 160, 2: 120, 3: 80}
    max_len = limits.get(level, 160)
    
    # Rule 1: Character count
    if len(response_text) > max_len:
        return False, (
            f"Response is too long (currently {len(response_text)} chars). "
            f"Limit it to {max_len} chars and focus on an essential 'question.'"
        )
    
    # Rule 2: Does it end with a question?
    tail = response_text[-10:] if len(response_text) >= 10 else response_text
    if "?" not in tail and "?" not in tail:
        return False, "Always end with a 'question' to the user."
    
    return True, None


# =============================================
# 🎉 4. Exit Trigger (Success Detection)
# =============================================

SUCCESS_KEYWORDS = [
    "correct", "that's right", "passed", "perfect",
    "is right", "is correct", "good understanding",
    "understand correctly", "hitting the nail",
    "no mistake", "accurate", "next unit", "next step"
]


def check_exit_trigger(response_text):
    """
    Detects if LLM response contains keywords indicating a "correct answer."
    Performs negative check to prevent false positives.
    
    Returns:
        is_success: bool
    """
    text_lower = response_text.lower()
    for kw in SUCCESS_KEYWORDS:
        if kw in text_lower:
            # Negative check
            if "not correct" not in text_lower and "incorrect" not in text_lower:
                return True
    return False


# =============================================
# 🔄 5. Prompt Definitions for Retry
# =============================================

LEVEL_CONSTRAINTS = {
    1: (
        "[Instruction] Act as a teacher. "
        "Before giving the conclusion, provide a hint and ask a question to guide the user."
    ),
    2: (
        "[Instruction] Refrain from explanations. "
        "Throw a metaphor like 'It's like ~' or a question that doubts the user's common sense."
    ),
    3: (
        "[Absolute Order] You are a cold Socrates. "
        "No answers, explanations, or empathy required. "
        "Return only one short counter-question that strikes at the user's 'assumptions.'"
    ),
}


def build_retry_prompt(original_system_prompt, feedback, level=1):
    """
    Constructs system prompt for One-shot Retry.
    
    Args:
        original_system_prompt: Original system prompt
        feedback: Feedback from Socratic Validation
        level: Socratic level
    
    Returns:
        retry_system_prompt: str
    """
    constraint = LEVEL_CONSTRAINTS.get(level, LEVEL_CONSTRAINTS[1])
    return (
        f"{original_system_prompt}\n\n"
        f"[Regeneration Instruction]\n{feedback}\n{constraint}\n"
        f"*Make sure to complete the sentence so it doesn't cut off halfway."
    )

Coming Soon

Discussion