iTranslated by AI
Full Design of Pseudo-Metacognitive Architecture: Building the Brain of the "Non-Answering RAG" [RAG Development Diary Vol. 5]
Overall Design of Pseudo-Metacognitive Architecture
── Building the Brain for the "RAG that Doesn't Answer"
Introduction
In Vol.1 to 4, I described the journey of the "RAG that doesn't provide answers ── Socrates" from its runaway state to being brought under control.
From here on, I have decided to record a technical implementation guide for you to build your own "Socratic RAG."
First, regarding the overall design: the LLM is not intended to be used as a standalone unit.
1. Limits of Standalone LLMs ── What Happened in Vol.3
Looking back at the "runaway" experienced in Vol.3, standalone LLMs have four structural limitations.
| Limit of Standalone LLM | Phenomenon in Socrates |
|---|---|
| Uncontrollable Scope | Faithfully answers even questions about cooking |
| Non-deterministic Output Quality | Returns a 148-character explanation even when instructed "within 100 characters" |
| Absence of Self-Monitoring | Provides long explanations even when told "don't answer" |
| Inability to Detect Deadlocks | Doesn't notice itself falling into a question loop |
Even if you write "don't answer" or "be brief" in the prompt, the LLM gets dragged by the context and starts a long-winded explanation.
Unless you enforce it as "must follow" in Python code, it often won't listen to a mere "request" in the prompt.
2. Design Philosophy: Sandwich Structure
2.1 The Big Picture
Place the LLM in the center as the "thinking brain," and sandwich its input side and output side with a Python control layer.
Overall Diagram
┌────────────────────────────────────┐
│ Python Control Layer │
│ │
│ ┌────────────┐ ┌───────────────┐ │
│ │ Pre-hooks │ │ Post-hooks │ │
│ │ (Input Cens)│ │ (Output Cens) │ │
│ └─────┬──────┘ └───────┬───────┘ │
│ │ │ │
│ ▼ ▲ │
│ ┌────────────────────────────┐ │
│ │ LLM (Claude 3 Haiku) │ │
│ │ "Thinking Brain" │ │
│ └────────────────────────────┘ │
└────────────────────────────────────┘
2.2 Why "Prompt Only" is Insufficient
| Approach | Pros | Limits |
|---|---|---|
| Prompt Only | Easy, immediate implementation | LLM "forgets" or "misinterprets" instructions |
| Sandwich | Deterministic control | Higher implementation cost |
The runaway in Vol.3 occurred because I tried to control it with "prompts only."
That's why I introduced the Python control layer (HOOK mechanism) in Vol.4.
3. Tech Stack and Project Structure
3.1 Technologies Used
| Layer | Tech | Selection Reason |
|---|---|---|
| Frontend | Streamlit | Can be completed with Python only, making it the fastest way to turn a prototype into a "working app." |
| Embedding | paraphrase-multilingual-MiniLM-L12-v2 | 384 dimensions, multi-language support. Sufficient accuracy for Scope determination in Japanese. |
| LLM | Claude 3 Haiku | Blazing fast and low cost. Speed is life for a design that assumes retries. |
| Storage | Google Firestore | Unified management of DB and Vector search. A dedicated Vector DB is over-engineering at this scale. |
| Similarity Calculation |
scikit-learn (cosine_similarity) |
For Scope Guard determination. High affinity with numpy arrays. |
3.2 Logical Structure of the Project
The current implementation is a single file (streamlit_app.py), but logically it is divided into the following four responsibilities. The individual explanations from Vol. 6 onwards will follow this structure.
Structure
socrates_rag/
│
├── app.py # 🖥️ Streamlit Frontend
│ # - Chat UI / Sidebar settings / Visualization
│ # - Main loop (Flow of Input → HOOK → Output)
│
├── hooks.py # 🛡️ HOOK Mechanism (Control Layer)
│ # - pre_hook_scope_guard()
│ # - check_deadlock_breaker()
│ # - socratic_validation()
│ # - post_hook_exit_trigger()
│
├── prompts.py # 📝 Prompt Definitions
│ # - System prompts for Teacher / Coaching
│ # - Level-specific constraints (prohibitions)
│ # - TOPIC_ANCHORS (Scope definitions)
│
├── vector_search.py # 🔍 Search & Embedding
│ # - Firestore connection/query
│ # - Vectorization / Similarity calculation
│ # - search_documents()
│
└── data/
└── docs/ # 📄 Markdown (Course of Study)
# - Structured data with Frontmatter
# → Detailed explanation in Vol. 6
4. System Architecture
4.1 Component Diagram
Component Diagram
4.2 Processing Flow (4 Phases)
Flowchart
5. HOOK Mechanism: 5 Layers of Control
This is the core of this article.
The five HOOKs are defense layers against different "failure modes." They aren't just filters; I arrived at this form through continuous debugging.
5.1 🛡️ Scope Guard ── Vectors Cannot Understand "Hello"
Purpose: To turn away questions outside the learning scope at the door.
User input is vectorized and evaluated using cosine similarity against "Anchor texts" defined for each category.
TOPIC_ANCHORS = {
"Tech_Singularity": "Computer science, artificial intelligence, semiconductor engineering...",
"Strategy_Mgmt": "Corporate management, business strategy, organizational management...",
"Life_Scaling": "Personal health management, mental health..."
}
# Threshold less than 0.15 → Reject as "Out of scope"
Is the latest vector search perfect?
No, reality isn't that sweet.
If you just type "Hello," the Scope Guard will ruthlessly reject it saying "Out of scope." This is because the concept of "greeting" does not exist in the technical document space.
It is always gritty rule-based (if statements) that compensate for these limitations. I really wanted to avoid this, but...
# Bypass circuit: Short text + Conversation keywords → Skip vector check
safe_keywords = ["don't know", "tell me", "hint", "thanks", "hello", ...]
if len(query) < 30 and any(kw in query for kw in safe_keywords):
return True, "" # Pass without vector check
"Pass if it's short and contains keywords." This tiny two-line snippet of code drastically saves the user experience.
How was the threshold 0.15 decided?
It's not a magic number. It is the result of gritty fine-tuning.
At first, it was too strict, and it rejected even slightly unique phrasing. So, I settled on 0.15 through a process of "starting loose (0.1) and tightening it up whenever false positives (passing topics that shouldn't pass) occurred."
If you are introducing this to your own project, I recommend starting at 0.1 and adjusting while observing actual user input. Including a mechanism to display the Debug value (similarity score) upon rejection makes adjustment significantly easier.
5.2 ⚠️ Deadlock Breaker ── An Escape from the Infinite Question Loop
Purpose: To automatically provide hints when Socratic questioning reaches a deadlock.
Socrates' job is to "ask questions." However, continuing to ask when a user is truly stuck is harassment.
# Trigger condition: Previous AI response was a question + any of the following:
# 1. Give-up declaration: "don't know", "give up", "help"
# 2. Meaningless input: "...", "???" (Regex check)
# 3. Repetition: Exactly the same answer as the previous one
When triggered, an emergency instruction is injected into the system prompt.
[Emergency System Instruction]
The conversation is in a deadlock. Stop repeating questions and
present a "core hint for the answer."
It temporarily deactivates "Coaching Mode" to help the user. This switching between strictness and kindness is realized through code.
5.3 🔍 Socratic Validation ── Censoring Output "Character Count" and "Endings"
Purpose: To censor whether the LLM is "teaching too much," only during Coaching Mode.
The strictness of the censorship changes depending on the "Socratic Level" selected via the sidebar slider.
| Socratic Level | Character Limit | UI Display |
|---|---|---|
| L1: Companion | 160 chars | 🟢 Hints included |
| L2: Silence | 120 chars | 🟡 Metaphors/Questions |
| L3: Iron Mask | 80 chars | 🔴 Questions only |
Censorship rules are simple, with only two.
def socratic_validation(response_text, level=1):
limits = {1: 160, 2: 120, 3: 80}
max_len = limits.get(level, 160)
# Rule 1: Character count
if len(response_text) > max_len:
return False, f"Too long ({len(response_text)} chars). Keep it within {max_len}."
# Rule 2: Does it end with a question?
if "?" not in response_text[-10:] and "?" not in response_text[-10:]:
return False, "Always end with a 'question' to the user."
return True, None
Why does it function with just these two?
"Character count" is a physical upper limit of information volume. There is a strong correlation: long answer = explaining.
"Question form" is a guarantee of the dialogue's direction. If it doesn't end with a question, it's the same as "teaching."
No complex NLP analysis is required. With just len() and string slicing, you can contain the LLM's "urge to teach."
Before / After ── The Moment the Execution Layer Intervenes
Let's look at what is actually happening here.
🔴 Before: Draft answer before correction (LLM "explained" it)
AI (Draft):
The reason RAG is useful for project management is mainly due to "information searchability"
and "context integration." Traditionally, project specifications and past meeting minutes
tend to be scattered, but with RAG, you can search across these and generate answers
tailored to current issues. This enables managers to speed up decision-making.
This answer is "correct." However, as Socrates, it is a failure. It has provided the answer.
⚙️ Intervention: Execution Layer Logs
[Socratic Validation] ⚠️ REJECTED
Reason : Length 148 chars > Level 3 limit (80 chars)
Action : Triggering One-shot Retry...
Constraint: [Absolute Order] Explanations, hints, and empathy are strictly prohibited.
Return only one short counter-question that strikes at their assumptions.
Validation detects the 148-character "explanation" and executes regeneration exactly once with the Level 3 prohibition order.
🟢 After: Final corrected answer (Level 3: Iron Mask)
AI (Final):
In project management, what information do you feel is
the most "wasteful to search for"?
The 148-character explanation changed into a "short question" with just one retry.
5.4 🔄 One-shot Retry ── Decisions Must Be Made in One Go
Purpose: When Socratic Validation fails, add level-specific "prohibitive instructions" and regenerate only once. Users cannot wait until it passes. Socrates must get it right in one shot. All constraints are packed into that single retry.
level_constraints = {
1: "Ask a question that provides a hint and guides the user.",
2: "Refrain from explanations and throw a question that uses metaphors or doubts the user's common sense.",
3: "No answers, explanations, or empathy required. Just one short counter-question that strikes at their assumptions."
}
Since we are forcing a retry, speed and cost are paramount. That's why we adopted the blazing-fast Claude 3 Haiku instead of Gemini Pro or Claude Opus. During retries, we narrow down to max_tokens=300 and fetch the response all at once using a synchronous call (messages.create) rather than streaming.
5.5 🎉 Exit Trigger ── Celebrating the Moment of the Correct Answer
Purpose: Detect when the user reaches the correct answer and perform a success effect.
success_keywords = [
"Correct", "That's right", "You passed",
"Good understanding", "Correct understanding", "Hitting the nail on the head", ...
]
# To prevent false positives: "Incorrect" and "Not right" are excluded
The moment the strict Socrates acknowledges the user, st.balloons() fly. This small effect multiplies the user's sense of achievement.
Ingenuity for preventing false positives
Since "That's a great question" is not a success, "great" is excluded from the keyword list. Negative forms like "That's not correct" or "You're not right" are also filtered out in a later check.
6. Prompt Construction: Two Personas and Contextual Inertia
6.1 Teacher Mode ── The Explaining Persona
You are an "explanation-loving teacher."
Based on the reference data, explain logically and clearly.
Add simplified explanations for technical terms to support learning.
Socratic Validation does not trigger. The LLM's output is displayed as is.
6.2 Coaching Mode ── The Questioning Persona
You are a "strict Socratic coach."
[Absolute Dialogue Rules]
1. No explanations: Do not state definitions or explanations.
2. Counter-questions: Reply with a question.
3. Short answers: Within 100 characters.
4. Hint limits: Only one hint when the user "gives up."
By embedding good and bad examples (Few-shot) in the prompt, we suppress the LLM's "urge to explain."
[Bad Example]
User: What is L0?
AI: L0 refers to the physical layer. Specifically...(Long explanation)
[Good Example]
User: What is L0?
AI: What do you think is the "foundation" that supports the Singularity?
6.3 Contextual Inertia ── An AI's Persona Resides in the "Immediate History"
Even after switching the mode from "Teacher" to "Coaching," the AI might still continue to speak in an explanatory tone.
AI reads the room. If the past logs are in "explanation mode," it reads that atmosphere (context) and tries to continue explaining. This is contextual inertia.
To break this, we separate the UI-side presentation from the API-side transmission logic.
# UI side: Show the "change" to the user (Add notification to session_state)
st.session_state.messages.append({
"role": "system",
"content": "[System Notification] Mode has been changed to 'Coaching'.\n Discard previous behaviors and stick to your new role."
})
# API side: Exclude system notifications and redefine roles in the System Prompt
messages_for_api = [m for m in messages if m["role"] != "system"]
Show the "change" to the user, and forcibly inject the "current role" into the AI via the system prompt.
This dual management is exactly what creates a sharp change in character.
Rather than making it read ambiguous logs, redefining the "current role" in the System Prompt (the highest-level command) is more effective for character changes.
7. Integrated Structure of the System Prompt
The system prompt ultimately sent to the LLM has a four-layer structure.
┌─────────────────────────────────────┐
│ Layer 1: Style Instruction │
│ Definition of Teacher or Coaching │
├─────────────────────────────────────┤
│ Layer 2: Past Context │
│ Summary of compressed past dialogue │
├─────────────────────────────────────┤
│ Layer 3: RAG Context │
│ Reference data from Firestore │
│ (top_k=5, full text concatenated) │
├─────────────────────────────────────┤
│ Layer 4: Emergency Override │
│ Emergency instructions only injected │
│ during a Deadlock │
└─────────────────────────────────────┘
Against this four-layer prompt, the latest 10 entries of conversation history (excluding system roles) are sent as messages. To keep token costs down, older history exceeding 10 entries is compressed into Past Context by having Claude Haiku summarize it.
8. Summary ── Three Lessons
If you rely only on an LLM, it remains just a "thinking brain" and lacks a "mechanism for self-discipline."
That's why I implemented a five-layer HOOK mechanism in Python to create a Pseudo-Metacognitive Architecture with a "sandwich structure" that grips both the input and output.
| HOOK | Position | What it protects |
|---|---|---|
| Scope Guard | Pre-input | Boundaries of the learning scope |
| Deadlock Breaker | Post-input | User's psychological safety |
| Socratic Validation | Post-output | The promise not to "teach" |
| One-shot Retry | Post-output | Response quality (deciding in one go) |
| Exit Trigger | Post-output | Experience at the moment of achievement |
Overall design essence is summarized in these three lessons.
🛠️ Appendix: Ready-to-use HOOK Function Set
Below is a minimal set of HOOK functions that you can copy and incorporate into your own project.
📋 HOOK Function Set (Python) ── Minimal setup to copy and try
"""
socrates_hooks.py
── Minimal implementation set for Socratic RAG HOOK mechanism
"""
import re
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# =============================================
# 🛡️ 1. Scope Guard (Input Censorship)
# =============================================
# "Concept Definitions (Anchors)" per category
# → Rewrite these according to your domain
TOPIC_ANCHORS = {
"Tech": "Computer science, artificial intelligence, semiconductor engineering, programming...",
"Business": "Corporate management, business strategy, organizational management...",
}
# Bypass keywords for conversation
SAFE_KEYWORDS = [
"don't know", "understand", "tell me", "hint", "correct", "answer",
"thanks", "hello", "continue", "yes", "no"
]
def scope_guard(query, query_vec, anchor_vec, threshold=0.15):
"""
Determines if input is within scope.
Args:
query: User input text (for bypass check)
query_vec: Embedding vector of the query (shape: [1, dim])
anchor_vec: Embedding vector of the Anchor text (shape: [1, dim])
threshold: Similarity threshold (default 0.15, 0.1 recommended as starting point)
Returns:
(is_valid: bool, similarity: float, message: str)
"""
# Bypass: Short text + conversation keywords → skip check
if len(query) < 30 and any(kw in query for kw in SAFE_KEYWORDS):
return True, 1.0, ""
# Vector similarity check
similarity = cosine_similarity(query_vec, anchor_vec)[0][0]
if similarity < threshold:
return False, similarity, f"🚫 Out of scope (Similarity: {similarity:.3f})"
return True, similarity, ""
# =============================================
# ⚠️ 2. Deadlock Breaker (Deadlock Detection)
# =============================================
GIVE_UP_KEYWORDS = [
"don't know", "no idea", "impossible", "can't do", "tell me", "answer",
"give up", "pass", "hint", "help"
]
def check_deadlock(history, current_input):
"""
Detects if the conversation is in a deadlock state.
Args:
history: List of chat history [{"role": "user"|"assistant", "content": "..."}]
current_input: Current user input
Returns:
is_deadlock: bool
"""
try:
if not history or len(history) < 2:
return False
last_msg = history[-1]
if last_msg.get("role") != "assistant":
return False
# Condition 1: Previous AI message ended with a question
last_ai = last_msg.get("content", "").strip()
if not last_ai.endswith(("?", "?")):
return False
# Condition 2: User gave up, input is meaningless, or is a repetition
is_give_up = any(kw in current_input.lower() for kw in GIVE_UP_KEYWORDS)
is_meaningless = bool(re.match(r"^[\s\.。、\??!!]+$", current_input))
is_repeating = (
len(history) >= 3
and current_input.strip() == history[-2].get("content", "").strip()
)
return is_give_up or is_meaningless or is_repeating
except Exception:
return False # On error, pass safely with no detection
# =============================================
# 🔍 3. Socratic Validation (Output Censorship)
# =============================================
def socratic_validation(response_text, level=1):
"""
Censors whether LLM output is "Socratic."
Args:
response_text: LLM response text
level: Socratic level (1=Companion, 2=Silence, 3=Iron Mask)
Returns:
(is_valid: bool, feedback: str | None)
"""
limits = {1: 160, 2: 120, 3: 80}
max_len = limits.get(level, 160)
# Rule 1: Character count
if len(response_text) > max_len:
return False, (
f"Response is too long (currently {len(response_text)} chars). "
f"Limit it to {max_len} chars and focus on an essential 'question.'"
)
# Rule 2: Does it end with a question?
tail = response_text[-10:] if len(response_text) >= 10 else response_text
if "?" not in tail and "?" not in tail:
return False, "Always end with a 'question' to the user."
return True, None
# =============================================
# 🎉 4. Exit Trigger (Success Detection)
# =============================================
SUCCESS_KEYWORDS = [
"correct", "that's right", "passed", "perfect",
"is right", "is correct", "good understanding",
"understand correctly", "hitting the nail",
"no mistake", "accurate", "next unit", "next step"
]
def check_exit_trigger(response_text):
"""
Detects if LLM response contains keywords indicating a "correct answer."
Performs negative check to prevent false positives.
Returns:
is_success: bool
"""
text_lower = response_text.lower()
for kw in SUCCESS_KEYWORDS:
if kw in text_lower:
# Negative check
if "not correct" not in text_lower and "incorrect" not in text_lower:
return True
return False
# =============================================
# 🔄 5. Prompt Definitions for Retry
# =============================================
LEVEL_CONSTRAINTS = {
1: (
"[Instruction] Act as a teacher. "
"Before giving the conclusion, provide a hint and ask a question to guide the user."
),
2: (
"[Instruction] Refrain from explanations. "
"Throw a metaphor like 'It's like ~' or a question that doubts the user's common sense."
),
3: (
"[Absolute Order] You are a cold Socrates. "
"No answers, explanations, or empathy required. "
"Return only one short counter-question that strikes at the user's 'assumptions.'"
),
}
def build_retry_prompt(original_system_prompt, feedback, level=1):
"""
Constructs system prompt for One-shot Retry.
Args:
original_system_prompt: Original system prompt
feedback: Feedback from Socratic Validation
level: Socratic level
Returns:
retry_system_prompt: str
"""
constraint = LEVEL_CONSTRAINTS.get(level, LEVEL_CONSTRAINTS[1])
return (
f"{original_system_prompt}\n\n"
f"[Regeneration Instruction]\n{feedback}\n{constraint}\n"
f"*Make sure to complete the sentence so it doesn't cut off halfway."
)
Discussion