iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔖

Code Mode MCP: Compressing Multiple MCP Servers into Two Tools

に公開

Introduction

Using multiple MCP servers (Notion, Playwright, Chrome DevTools) with Claude Code consumes 36.6k tokens just for tool definitions.
By creating a meta-MCP server called "Code Mode" and compressing it into 2 tools, I reduced this to 4.4k tokens (an 88% reduction).
In this article, I will introduce the approach and share some implementation pitfalls. I hope this serves as a reference for those facing similar challenges.

The Story Behind Code Mode

The name "Code Mode" and this approach are based on the concepts introduced in Anthropic's article Code execution with MCP.

Instead of passing a large number of tool definitions to the LLM, allowing it to write and execute code reduces the overhead of tool definitions while enabling flexible operations.

For this project, I connected Notion, Playwright, and Chrome DevTools as child servers. Playwright and Chrome DevTools have overlapping features, but I included both for testing purposes. While the reduction in context size is largely due to this specific setup, the overall effectiveness remains clear.

The MCP Context Problem

Using MCP servers with Claude Code is convenient, but the tool definitions are included in the initial context.
(You can check this with /context)

MCP tools: 36.6k tokens (18.3%)
└ mcp__chrome-devtools__click: 636 tokens
└ mcp__chrome-devtools__close_page: 624 tokens
└ mcp__chrome-devtools__drag: 638 tokens
... (26 tools)
└ mcp__playwright__browser_close: 573 tokens
└ mcp__playwright__browser_click: 753 tokens
... (22 tools)
└ mcp__notionApi__notion-fetch: ...
... (15 tools)

15 tools for Notion, 22 for Playwright, 26 for Chrome DevTools... Over 60 tool definitions were being loaded every time. Of the 200k tokens in the context, 18% was occupied by tool definitions.

Solving it with Code Mode

Instead of teaching the "usage (definition) of all tools," we only teach "how to search" and "how to execute."

Code Mode MCP (2 tools)
├── search_docs: Search API documentation of child MCP servers
└── execute_code: Call child MCP tools using TypeScript code

search_docs

// List all bindings if query is omitted
await search_docs({})
// => notion: { notion-search, notion-fetch, ... }
// => playwright: { browser_click, browser_navigate, ... }
// => chrome: { click, navigate_page, ... }

// Search for a specific feature
await search_docs({ query: 'navigate' })
// => Returns detailed schema for playwright.browser_navigate

execute_code

This is the core functionality of Code Mode. Within the provided code, child server names (such as playwright, notion, chrome, etc.) can be used directly. This is because Code Mode injects the bindings for each server at runtime.

await execute_code({
  code: `
    // playwright, notion, and chrome are available
    await playwright.browser_navigate({ url: "https://example.com" });
    const snapshot = await playwright.browser_snapshot({});
    console.log(snapshot);
  `,
})

Multiple operations can be combined into a single call.

Implementation Points

Execution Environment for execute_code

In execute_code, it is necessary to execute arbitrary TypeScript generated by the LLM. However, since allowing it to do anything would be dangerous, I wanted to implement a certain degree of sandboxing.

I chose Deno to meet this requirement. When code within a Worker calls an MCP tool, it sends a request to the parent process via postMessage, and the parent process performs the actual MCP call and returns the result. Since the Worker itself does not have external access permissions, it can only interact with the outside world through MCP tools.

  1. Native TypeScript Execution: LLM-generated TypeScript code can be executed directly without needing transpilation.
  2. Built-in Worker API: Web Worker-based sandbox execution is available by default.
  3. Permission Model: Deno's "no network/no file access" permissions can be easily applied to the Worker.
  4. MCP SDK Compatibility: @modelcontextprotocol/sdk works with Deno.

Execution Environment Constraints

The code executed in execute_code has the following constraints:

Item Constraint
Timeout 30 seconds
Available APIs console.log, basic JS/TS syntax, MCP bindings

Code is executed within a Deno Worker, and MCP tool calls are delegated to the parent process via postMessage. Because Deno's permissions are restricted when creating the Worker, APIs such as fetch or Deno.readFile cannot be used.

However, data can be read or written via MCP tools using authenticated sessions for Notion, or browsers can be operated via Playwright. The Worker's permission restrictions are intended to prevent "code generated by the LLM from directly accessing arbitrary URLs," and do not restrict access via MCP. Since the assumption is that I am the one using this in my own environment, I believe this is acceptable.

const worker = new Worker(workerUrl, {
  type: 'module',
  deno: {
    permissions: {
      net: false,
      read: false,
      write: false,
      env: false,
      run: false,
      ffi: false,
    },
  },
})

Connecting to child servers

Code Mode MCP connects to child MCP servers internally and relays tool calls.

Claude Code
    ↓ stdio
Code Mode MCP (2 tools: search_docs, execute_code)
    ├── stdio + mcp-remote → Notion MCP (SSE)
    ├── stdio → Chrome DevTools MCP
    └── stdio → Playwright MCP

Notion MCP requires OAuth authentication and cannot be connected directly via stdio. I use mcp-remote to bridge it via SSE. Since mcp-remote handles the authentication flow, the Code Mode side only needs to communicate with mcp-remote via stdio.

await Promise.all([
  this.connectToNotion(), // SSE connection via mcp-remote
  this.connectToChromeDevTools(), // Direct stdio connection
  this.connectToPlaywright(), // Direct stdio connection
])

If a connection fails, the corresponding server is skipped, and other servers operate normally. Unconnected servers do not appear in the search_docs list, and calling a tool from such a server in execute_code results in an Unknown tool error. Automatic reconnection upon disconnect is not yet implemented; the current workflow is to manually reconnect using /mcp restart.

Pitfall: os error 35

Occurrence condition: When trying to start another MCP server using StdioClientTransport from within an MCP server (which itself is communicating with its parent via stdio).

Resource temporarily unavailable (os error 35)

Cause: The default setting of StdioClientTransport, stderr: 'inherit', interferes with the stdio communication of the parent process (Claude Code).

Solution: Specify stderr: 'pipe' to isolate the child process's stderr. This setting is required for all child server connections.

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['-y', '@playwright/mcp@latest'],
  stderr: 'pipe', // Required for all child servers
})

Optimizing Tool Definitions

In the initial implementation, the description for execute_code included a list of bindings and code examples.

// Before: 1.3k tokens
description: `TypeScriptコードを実行して子MCPサーバーのツールを呼び出します。

利用可能なbindings:
- notion: { notion-search, notion-fetch, ... }
- playwright: { browser_click, ... }
...`

The description grows larger as more child servers are added. As an improvement, I minimized the description and moved to a workflow where bindings are looked up using search_docs.

// After: 687 tokens
description: 'Operate child MCP servers using TypeScript code. Check bindings via search_docs.'

Results of Introducing Code Mode

Item Before After
MCP tools 36.6k tokens 4.4k tokens
Number of tools 63 2
Context occupancy 18.3% 2.2%

Challenges of Code Mode

Usability for the LLM

With standard MCP tools, the LLM can usually understand what to do just by looking at the tool names. In contrast, Code Mode requires a two-step process: first searching via search_docs, then writing TypeScript and running it via execute_code. You need to explain this workflow to the LLM when it first encounters it.

Designed for Personal Use

Since the configuration of child servers is hardcoded, it is designed with the assumption that users will modify the code to suit their own environments.

Fragile Connection Management

It only attempts to connect once at startup, so if a connection is lost, it remains disconnected. If the Notion OAuth session expires or Chrome is closed, the tools will be unavailable until you manually run /mcp restart.

Summary

  • MCP tool definitions were consuming a large amount of context.
  • I managed to compress them into two tools—"how to search" and "how to execute"—using a meta-MCP server.
  • Using Deno allows for executing TypeScript directly.
  • Forgetting stderr: 'pipe' leads to os error 35.
  • There are still challenges regarding usability for the LLM and general versatility.

The code is written in Deno + @modelcontextprotocol/sdk.
The implemented code is available on Gist. I hope this serves as a reference for those facing similar challenges.

https://gist.github.com/tgfjt/537f79fed12c580814bce4818f508556

Discussion