iTranslated by AI
Getting Started with OpenAI Agent SDK: Integrating LLMs with Custom Functions and External Services
Introduction
Some time has passed since OpenAI began providing agent functionalities centered around the Agents SDK.
Using this agent functionality makes it easy for an LLM to perform multi-step processes while calling external tools and services.
For example, you can create a workflow where the results of an LLM query are searched on the Web, the content is analyzed by a custom function, and then the analysis result is sent back to the LLM for further querying.
In this document, we will look at how to use agents by creating and running samples using the OpenAI Agent SDK in TypeScript and Python, moving from a state of never having used agents before to understanding when to utilize them. This article is intended for readers who have used the OpenAI API with TypeScript or Python but have not yet used agents.
Environment Setup and Quickstart
Quickstarts for TypeScript and Python are available on the official pages. These are excellent resources if you want to quickly get the OpenAI Agent SDK running.
Additionally, here is how to set up the environment for the sample code in this document.
Environment Setup for TypeScript
Prepare Node.js version v25.2.1.
Prepare the following package.json.
package.json
{
"name": "playwright_mcp",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"build": "tsc",
"dev": "tsx",
"start": "node dist/index.js"
},
"keywords": [],
"author": "",
"license": "ISC",
"type": "module",
"dependencies": {
"@openai/agents": "^0.4.4",
"@playwright/mcp": "^0.0.62",
"dotenv": "^17.2.3",
"playwright": "^1.59.0-alpha-1769819922000",
"zod": "^4.3.6"
},
"devDependencies": {
"@playwright/test": "^1.58.1",
"@types/node": "^22.0.0",
"jest": "^30.2.0",
"tsx": "^4.19.2",
"typescript": "^5.7.2"
}
}
Installation:
npm install
Setting environment variables in the .env file
OPENAI_API_KEY=sk-proj-redacted
Environment Setup for Python
Install uv using any method.
pyproject.toml
[project]
name = "vecdb"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = "0=3.13"
dependencies = [
"openai0=2.14.0",
"pydantic0=2.12.5",
"python-dotenv0=1.2.1",
"tqdm0=4.67.1",
"ipykernel",
"openai-agents0=0.7.0",
]
[dependency-groups]
dev = [
"pyrefly0=0.47.0",
"pytest0=9.0.2",
"pytest-cov0=7.0.0",
"ruff0=0.14.10",
]
Installation:
uv sync
Setting environment variables in the .env file
OPENAI_API_KEY=sk-proj-redacted
Note that the experiments were conducted between February 1st and February 4th, 2026, so there is a possibility that they may not work as described in future releases.
Simple Sample
Here, we will look at the simplest examples of using an Agent in TypeScript and Python. This sample simply asks a question to the LLM and retrieves the result. It is intended to introduce basic usage and does not fully demonstrate the effectiveness of Agents.
Simple Sample Example in TypeScript
You can see a simple example of using an Agent in TypeScript as follows.
import { Agent, run } from "@openai/agents";
import dotenv from "dotenv";
dotenv.config();
const agent = new Agent({
name: 'Guide-City-Agent',
instructions: '日本の市町村名に関する質問です.ユーザーの質問に最も一致する市町村名を簡潔に答えてください。',
model: "gpt-5-nano"
});
const result = await run(agent, '東京ディズニーランドがある市はどこですか?');
console.log(result.finalOutput);
Execute this script as follows:
npx tsx sample_agent_sdk.ts
You create an agent object with the Agent class and execute it with the run function.
The final result is stored in the finalOutput property of the run function execution result.
If you want to check the usage status, such as the number of tokens used, you can do so by referring to the Usage class in the state property.
console.log(result.state.usage);
Also, the history of how the Agent operated can be checked from the OpenAI Logs page.


In this experiment, the series of processes for the agent were automatically grouped as a workflow and could be checked in the Traces tab, but they can also be controlled in arbitrary units.
Additionally, the operation history can also be checked from the history property of the run function execution result.
console.dir(result.history, { depth: null });
Simple Sample Example in Python
You can see a simple example of using an Agent in Python as follows.
from dotenv import load_dotenv
from agents import Agent, Runner
load_dotenv()
# Agent Definition
agent = Agent(
name="Guide-City-Agent",
instructions=(
"日本の市町村名に関する質問です。"
"ユーザーの質問に最も一致する市町村名を簡潔に答えてください。"
),
model="gpt-5-nano",
)
# Execution (Equivalent to JS: run(agent, '...'))
result = Runner.run_sync(
agent,
input="東京ディズニーランドがある市はどこですか?"
)
# Output
print(result.final_output) # JS: result.finalOutput
Execute this script as follows:
uv run agent_sample01.py
Usage in Python is almost the same as in TypeScript.
Please refer to the following for the classes and functions used:
Note that as of February 3rd, 2026, there is no way to retrieve the state directly from RunResult[1]. Therefore, you can obtain the usage status by retrieving ModelResponse via raw_responses as shown below:
# Usage (Equivalent to JS: result.state.usage)
if result.raw_responses:
print(result.raw_responses[-1].usage)
print(result.raw_responses)
Additionally, the history of the agent's execution can be obtained from new_items in RunResult.
for item in result.new_items:
print(item)
Sample for Multi-Step Queries
In this section, we will look at samples in TypeScript and Python for performing multi-step queries.
In this example, the following queries are performed:
- Obtain the municipality name in the first query (e.g., 'In which city is Tokyo Disneyland located?')
- Query the prefecture and population of the municipality obtained in the first response[2]
Sample for Multi-Step Queries in TypeScript
When making multi-step queries, there are two ways: passing the result of the previous answer as input, or using a session to maintain the context of the conversation up to that point.
The following is a sample that simply includes the content of the previous answer as input.
import { Agent, run } from "@openai/agents";
import dotenv from "dotenv";
dotenv.config();
const agent1 = new Agent({
name: 'Guide-City-Agent',
instructions: '日本の市町村名に関する質問です.ユーザーの質問に最も一致する市町村名を簡潔に答えてください。',
model: "gpt-5-nano"
});
const result1 = await run(agent1, '東京ディズニーランドがある市はどこですか?');
console.log('最初の問い:', result1.finalOutput);
const agent2 = new Agent({
name: 'Guide-City-Detail-Agent',
instructions: '指定した市について所属する都道府県と今わかる人口を簡潔に答えてください',
model: "gpt-5-nano"
});
const result2 = await run(agent2, result1.finalOutput as string);
console.log('次の問い:', result2.finalOutput);
Output Example:
First query: Urayasu City
Next query: Urayasu City belongs to Chiba Prefecture. Its population is approximately 170,000 (latest estimate).
In this example, the result of the first execution is explicitly provided to obtain the response during the second agent execution.
On the other hand, by using the session feature, you can generate responses utilizing the history of previous conversations.
import { Agent, run, OpenAIConversationsSession } from "@openai/agents";
import dotenv from "dotenv";
dotenv.config();
const agent1 = new Agent({
name: 'Guide-City-Agent',
instructions: '日本の市町村名に関する質問です.ユーザーの質問に最も一致する市町村名を簡潔に答えてください。',
model: "gpt-5-nano"
});
const session = new OpenAIConversationsSession();
const result1 = await run(agent1, '東京ディズニーランドがある市はどこですか?', {
session
});
console.log('最初の問い:', result1.finalOutput);
const agent2 = new Agent({
name: 'Guide-City-Detail-Agent',
instructions: '指定した市について所属する都道府県と今わかる人口を簡潔に答えてください',
model: "gpt-5-nano"
});
const result2 = await run(agent2, "前回の回答に対する市町村について答えてください", {
session
});
console.log('次の問い:', result2.finalOutput);
In this example, the first response is not explicitly used in the second query, but the same session is utilized for both the first and second executions.
This allows the second query to make use of the previous conversation history.
This conversation history can be persisted as follows:
import fs from "fs";
// (omitted)
const session = new OpenAIConversationsSession();
const result1 = await run(agent1, '東京ディズニーランドがある市はどこですか?', {
session
});
console.log('最初の問い:', result1.finalOutput);
// Persist
const snapshot = await session.getItems();
fs.writeFileSync(
"session.json",
JSON.stringify(snapshot, null, 2),
"utf-8"
);
// (omitted)
The persisted conversation history can be restored as follows:
const items = JSON.parse(fs.readFileSync("session.json", "utf-8"));
const session = new OpenAIConversationsSession();
await session.addItems(items)
Additionally, please refer to the following for information on handling sessions in TypeScript.
Sample for Multi-Step Queries in Python
Let's look at an example of multi-step queries using sessions in Python.
from agents import Agent, Runner, OpenAIConversationsSession
from dotenv import load_dotenv
load_dotenv()
# --- Agent 1 -------------------------------------------------
agent1 = Agent(
name="Guide-City-Agent",
instructions=(
"日本の市町村名に関する質問です。"
"ユーザーの質問に最も一致する市町村名を簡潔に答えてください。"
),
model="gpt-5-nano",
)
# Create session
session = OpenAIConversationsSession()
# First run
result1 = Runner.run_sync(
agent1,
"東京ディズニーランドがある市はどこですか?",
session=session,
)
print("最初の問い:", result1.final_output)
# --- Agent 2 -------------------------------------------------
agent2 = Agent(
name="Guide-City-Detail-Agent",
instructions=(
"指定した市について所属する都道府県と"
"今わかる人口を簡潔に答えてください。"
),
model="gpt-5-nano",
)
# Second run (passing the same session)
result2 = Runner.run_sync(
agent2,
"前回の回答に対する市町村について答えてください",
session=session,
)
print("次の問い:", result2.final_output)
In Python, you can also manage sessions using OpenAIConversationsSession.
Since session persistence involves asynchronous processing, you need to use the asyncio library as follows.
How to Persist a Session
import asyncio
# (omitted)
async def dump_items():
items = await session.get_items()
with open("session_items.json", "w", encoding="utf-8") as f:
json.dump(items, f, ensure_ascii=False, indent=2)
asyncio.run(dump_items())
How to Restore a Session
async def restore_session():
# Create session
session = OpenAIConversationsSession()
with open("session_items.json", "r", encoding="utf-8") as f:
items = json.load(f)
# ★ await here
await session.add_items(items)
return session
session = asyncio.run(restore_session())
For more information on using sessions in Python, please refer to the following:
- Examples of Sessions in Python
- Sessions using sqlalchemy
- Advanced SQLite Sessions
- Encrypted Sessions
Sample of Delegating to Another Agent via Handoff
By using handoffs, you can delegate part of a conversation to another agent.
In the following example, we will see how to route user questions to either a "Math/Calculation Agent" or a "Travel/Sightseeing Agent."
Sample of Handoff in TypeScript
Sample Code
import { Agent, run, handoff } from "@openai/agents";
import { RECOMMENDED_PROMPT_PREFIX } from '@openai/agents-core/extensions';
import dotenv from "dotenv";
dotenv.config();
// 1) Specialist: Mathematics
const mathAgent = new Agent({
name: "Math agent",
model: "gpt-5-nano",
instructions: `${RECOMMENDED_PROMPT_PREFIX}\nYou are a math specialist. Answer only calculations, formulas, and logic concisely.`,
});
// 2) Specialist: Travel
const travelAgent = new Agent({
name: "Travel agent",
model: "gpt-5-nano",
instructions: `${RECOMMENDED_PROMPT_PREFIX}\nYou are a travel specialist. Answer only itineraries, precautions, and recommendations concisely.`,
});
// 3) Reception (Triage)
const toMath = handoff(mathAgent);
const toTravel = handoff(travelAgent);
const triageAgent = Agent.create({
name: "Triage agent",
model: "gpt-5-nano",
instructions: `\nYou are for "sorting only". You must not answer in natural language.\nBe sure to call one of the following tools only once and finish.\n\n- Math/calculation-like questions → Math agent\n- Travel/sightseeing-like questions → Travel agent\n\n(Forbidden) Outputting text like "I have transferred you to..." or "Please wait."\n(Required) Tool calls only.\n`,
handoffs: [toMath, toTravel],
});
// Execution Example 1: Dispatch to Math
const r1 = await run(triageAgent, "What is 17*23?");
console.log("Answer 1:", r1.lastAgent?.name, r1.finalOutput);
// Execution Example 2: Dispatch to Travel
const r2 = await run(triageAgent, "What if I visit Kyoto on a day trip next week?");
console.log("Answer 2:", r2.lastAgent?.name, r2.finalOutput);
Execution Result
Answer 1: Math agent Compute: 17 * 23 = 391.
Answer 2: Travel agent Here is an example of a day trip plan for Kyoto. It assumes travel primarily by JR and city buses.
- 08:30 Depart Tokyo (To Kyoto via Shinkansen or limited express in about 2:15–2:40)
- 11:00 Around Kiyomizu-dera Temple, Sannenzaka, and Ninenzaka
- It is recommended to visit Kiyomizu-dera's grounds early to avoid crowds during cherry blossom or autumn foliage seasons.
- 12:30 Lunch around Kiyomizu
- Kyoto-style genres: Bento boxes, Yudofu (boiled tofu), eel dishes, etc.
- 14:00 Stroll around Yasaka Shrine and the Gion area
- Head south along the Kamo River toward the Shijo area.
- 16:00 Move to Kinkaku-ji Temple or the Arashiyama area
- Kinkaku-ji is a classic; Arashiyama's Togetsukyo Bridge and Bamboo Grove are staples.
- 18:30 Dinner around Kyoto Station
- Enjoy Kyo-ryori (Kyoto cuisine) or creative Japanese cuisine in Gion/Kawaramachi.
- 20:00 Return to Tokyo by Shinkansen or limited express.
Points
- To avoid congestion, move early in the morning. Kinkaku-ji and Kiyomizu-dera, which hit peak crowds, are best visited first thing in the morning.
- Alternatives based on season: Philosopher's Path in spring, Arashiyama or Tofuku-ji for autumn foliage.
- Wear shoes suitable for walking in Kyoto. Keep luggage light.
If needed, please let me know your schedule preferences (food preferences, companion's age, interests). I will optimize it.
You create a HandOff object from the Agent and use Agent.create to create an Agent equipped with handoffs. Looking at the execution results, you can verify the last executed agent through the display of r1.lastAgent.name or r2.lastAgent.name. In this case, you can confirm that the Triage-Agent appropriately uses Math agent for Answer 1 and Travel agent for Answer 2.
"RECOMMENDED_PROMPT_PREFIX" is specified in the agent's instructions, which is said to help stabilize the LLM.[3]
For other samples regarding handoffs in TypeScript, please refer to the following:
Sample of Handoff in Python
Implementation for handoffs in Python is possible in a similar way to TypeScript.
Sample Code
from dotenv import load_dotenv
from agents import Agent, Runner, handoff
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX
load_dotenv()
# 1) Specialist: Mathematics
math_agent = Agent(
name="Math agent",
model="gpt-5-nano",
instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
You are a math specialist. Answer only calculations, formulas, and logic concisely.""",
)
# 2) Specialist: Travel
travel_agent = Agent(
name="Travel agent",
model="gpt-5-nano",
instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
You are a travel specialist. Answer only itineraries, precautions, and recommendations concisely.""",
)
# 3) Reception (Triage)
to_math = handoff(math_agent)
to_travel = handoff(travel_agent)
triage_agent = Agent(
name="Triage agent",
model="gpt-5-nano",
instructions="""
You are for \"sorting only\". You must not answer in natural language.
Be sure to call one of the following tools only once and finish.
- Math/calculation-like questions → Math agent
- Travel/sightseeing-like questions → Travel agent
(Forbidden) Outputting text like \"I have transferred you to...\" or \"Please wait.\"
(Required) Tool calls only.
""",
handoffs=[to_math, to_travel],
)
# Execution Example 1: Dispatch to Math
r1 = Runner.run_sync(triage_agent, "17*23?")
print("Answer 1:", r1.last_agent.name, r1.final_output)
# Execution Example 2: Dispatch to Travel
r2 = Runner.run_sync(triage_agent, "What if I visit Kyoto on a day trip next week?")
print("Answer 2:", r2.last_agent.name, r2.final_output)
For other samples regarding handoffs in Python, please refer to the following:
Sample of Function Calling via Tools
Agents can use tools to perform tasks such as data retrieval, external API calls, and code execution. In this example, we will call a simple sample code using a tool.
Sample of Function Calling in TypeScript
Sample Code
import { Agent, run, tool } from "@openai/agents";
import { z } from 'zod';
import dotenv from "dotenv";
dotenv.config();
const addTool = tool({
name: "add",
description: "Add two numbers",
parameters: z.object({
a: z.number(),
b: z.number()
}),
async execute({ a, b }) {
return a + b;
}
});
const agent = new Agent({
name: "calculator-agent",
instructions: "You are a calculator. Use the appropriate tool.",
tools: [addTool],
modelSettings: { toolChoice: 'required' }, // auto: default or required: always call tool or none: don't use tool or tool name
model: "gpt-5-nano"
});
async function main() {
const result = await run(agent, "What is 12 plus 30?");
console.log(result.finalOutput);
console.dir(result.history, { depth: null });
}
main().catch(console.error);
Output Example
Adding 12 and 30 gives 42.
[
{ type: 'message', role: 'user', content: 'What is 12 plus 30?' },
{
type: 'reasoning',
id: 'rs_0c73e3b6fa2a217100698288a1e6d081a0b69138a85f87743c',
content: [],
providerData: {
id: 'rs_0c73e3b6fa2a217100698288a1e6d081a0b69138a85f87743c',
type: 'reasoning'
}
},
{
type: 'function_call',
id: 'fc_0c73e3b6fa2a217100698288a6df6481a09137b1c7437734d1',
callId: 'call_U1j3e0JIIB8t3X2SKkiwB6uv',
name: 'add',
status: 'completed',
arguments: '{"a":12,"b":30}',
providerData: {
id: 'fc_0c73e3b6fa2a217100698288a6df6481a09137b1c7437734d1',
type: 'function_call'
}
},
{
type: 'function_call_result',
name: 'add',
callId: 'call_U1j3e0JIIB8t3X2SKkiwB6uv',
status: 'completed',
output: { type: 'text', text: '42' }
},
{
id: 'msg_0c73e3b6fa2a217100698288a93c3881a08303d1dc41718e2f',
type: 'message',
role: 'assistant',
content: [
{
type: 'output_text',
text: 'Adding 12 and 30 gives 42.',
annotations: [],
logprobs: []
}
],
status: 'completed',
providerData: {}
}
]
When performing function calls, you can define parameters with strict types by defining a Zod schema using the zod library.
You create a tool object and define the function within it. By passing this tool object to the Agent object, the agent can use the tool.
Whether the Agent object uses the tool automatically or is forced to use it can be selected via toolChoice in modelSettings.
Checking the execution results of this function, you can confirm the history consists of the following:
- LLM reasoning on the input
- Call to the
addfunction - Retrieval of the
addfunction's result - LLM reasoning on the function's result
In other words, it doesn't just return the result of the add function; the response to the query is generated after the LLM performs reasoning on the result of the add function.
Sample of Function Calling using Python
Sample Code
from agents import Agent, Runner, function_tool, ModelSettings
from pydantic import BaseModel
import dotenv
dotenv.load_dotenv()
class AddInput(BaseModel):
a: int
b: int
@function_tool
def add_tool(args: AddInput) -> int:
"""Add two numbers
Args:
args: AddInput class
"""
return args.a + args.b
agent = Agent(
name="calculator-agent",
instructions="You are a calculator. Use the appropriate tool.",
tools=[add_tool],
model="gpt-5-nano",
model_settings=ModelSettings(
tool_choice="required"
)
)
result = Runner.run_sync(
agent,
input="12と30をたすと?"
)
# Output
print(result.final_output) # JS: result.finalOutput
# Progress (new_items / raw_responses are close to JS: result.history)
print("new_items:")
for item in result.new_items:
print(item)
In Python, you can implement function tools using the function_tool function decorator.
Examples of Using Hosted Tools
OpenAI provides built-in tools such as Web search and searching vector stores hosted on OpenAI.
In this example, we will check a sample of an agent that performs a Web search on the host.
Note that when executing a normal API, it is done as follows:
Sample for Performing Web Search in TypeScript
Sample Code
import { Agent, run, webSearchTool } from "@openai/agents";
import dotenv from "dotenv";
dotenv.config();
// Reference
// https://github.com/openai/openai-agents-js/blob/main/examples/tools/web-search.ts
const agent = new Agent({
name: "web-search ",
instructions: "You are an agent that searches the Web and returns results. Perform a search for the question and answer concisely with results suitable for the user's geographical location.",
tools: [
webSearchTool({
userLocation: { type: 'approximate', city: 'Tokyo' },
})
],
modelSettings: { toolChoice: 'required' }, // auto: default or required: always call tool or none: don't use tool or tool name
model: "gpt-5-nano"
});
async function main() {
const result = await run(agent, "現在の天気と気温は?");
console.log(result.finalOutput);
console.dir(result.history, { depth: null });
}
main().catch(console.error);
Output Result
東京の現在の天気はほぼ晴れ、気温は約3°Cです。乾燥注意報が出ています。
[
{ type: 'message', role: 'user', content: '現在の天気と気温は?' },
{
type: 'reasoning',
id: 'rs_08630de6fc5803a60069828d247dfc819ea7a38e9aa4a75859',
content: [],
providerData: {
id: 'rs_08630de6fc5803a60069828d247dfc819ea7a38e9aa4a75859',
type: 'reasoning'
}
},
{
type: 'hosted_tool_call',
id: 'ws_08630de6fc5803a60069828d272f28819ea3b748e81892e12a',
name: 'web_search_call',
status: 'completed',
output: undefined,
providerData: {
id: 'ws_08630de6fc5803a60069828d272f28819ea3b748e81892e12a',
type: 'web_search_call',
action: {
type: 'search',
queries: [ 'weather: Tokyo, Japan' ],
query: 'weather: Tokyo, Japan'
}
}
},
{
type: 'reasoning',
id: 'rs_08630de6fc5803a60069828d280b94819e9480bb8b388fe0dc',
content: [],
providerData: {
id: 'rs_08630de6fc5803a60069828d280b94819e9480bb8b388fe0dc',
type: 'reasoning'
}
},
{
id: 'msg_08630de6fc5803a60069828d30aa34819e975b90001e24208b',
type: 'message',
role: 'assistant',
content: [
{
type: 'output_text',
text: '東京の現在の天気はほぼ晴れ、気温は約3°Cです。乾燥注意報が出ています。',
annotations: [],
logprobs: []
}
],
status: 'completed',
providerData: {}
}
]
By specifying webSearchTool as a tool, Web searching on the host side becomes possible.
You can adjust the parameters of the webSearchTool object, for example, to limit the domains accessed using the filters option.
Sample for Performing Web Search in Python
Sample Code
from agents import Agent, Runner, WebSearchTool, ModelSettings
import dotenv
dotenv.load_dotenv()
agent = Agent(
name="web-search",
instructions="You are an agent that searches the Web and returns results. Perform a search for the question and answer concisely with results suitable for the user's geographical location.",
tools=[
WebSearchTool(
user_location={ "type": "approximate", "city": "Tokyo" }
)
],
model="gpt-5-nano",
model_settings=ModelSettings(
tool_choice="required"
)
)
result = Runner.run_sync(
agent,
input="現在の天気と気温は?" # Current weather and temperature?
)
# Output
print(result.final_output)
This can be achieved by adding WebSearchTool to the agent's tools.
Examples of Using Local Built-in Tools
Built-in tools for local PCs are also provided, including computer operation via GUI, Shell execution, and applying patches.
Since Shell execution and patch application are supported only by specific models, please check the documentation for details.
For the shell experiments performed below, gpt-5.2 is used because gpt-5-nano and others are not supported.
Sample for Shell Execution in TypeScript
The following is available in the official samples, and it is recommended to refer to it for implementation:
The following is based on the above, with Japanese translation applied and timeout/exception handling removed. Please refer to the official sample when using it in an actual production environment.
Sample Code
import { exec } from 'node:child_process';
import { promisify } from 'node:util';
import process from 'node:process';
import {
Agent,
run,
Shell,
ShellAction,
ShellResult,
shellTool,
} from '@openai/agents';
import dotenv from "dotenv";
dotenv.config();
const execAsync = promisify(exec);
// Reference:
// https://platform.openai.com/docs/guides/tools-shell
class LocalShell implements Shell {
constructor(private readonly cwd: string = process.cwd()) {}
// Many things are omitted here, so refer to the following to run it properly:
// https://raw.githubusercontent.com/openai/openai-agents-js/refs/heads/main/examples/tools/shell.ts
async run(action: ShellAction): Promise<ShellResult> {
console.log('run...', action);
const output: ShellResult['output'] = await Promise.all(
action.commands.map(async (command) => {
const { stdout, stderr } = await execAsync(command, {
cwd: this.cwd,
timeout: action.timeoutMs,
maxBuffer: action.maxOutputLength,
});
return {
command,
stdout,
stderr,
outcome: { type: "exit", exitCode: 0 },
};
}),
);
console.dir(output, { depth: null });
return {
output,
providerData: { working_directory: this.cwd },
};
}
}
async function promptShellApproval(commands: string[]): Promise<boolean> {
console.log(' The following commands will be executed: \n');
commands.forEach((cmd) => console.log(` > ${cmd}`));
const { createInterface } = await import('node:readline/promises');
const rl = createInterface({
input: process.stdin,
output: process.stdout,
});
try {
const answer = await rl.question('\nProceed? [y/N] ');
const approved = answer.trim().toLowerCase();
return approved === 'y' || approved === 'yes';
} finally {
rl.close();
}
}
async function main() {
const shell = new LocalShell();
const agent = new Agent({
name: 'Shell Assistant',
model: 'gpt-5.2',
instructions:
'You can inspect repositories by executing shell commands. Keep responses concise and include command output if it is helpful.',
tools: [
shellTool({
shell,
// Ask for approval before executing the tool
needsApproval: true,
onApproval: async (_ctx, approvalItem) => {
const commands =
approvalItem.rawItem.type === 'shell_call'
? approvalItem.rawItem.action.commands
: [];
const approve = await promptShellApproval(commands);
return { approve };
},
}),
],
});
const result = await run(agent, 'Show the Node.js version.');
console.log(result.finalOutput);
}
main().catch((error) => {
console.error(error);
process.exitCode = 1;
});
Execution Result
The following commands will be executed:
> node -v
> npm -v
Proceed? [y/N] y
run... { commands: [ 'node -v', 'npm -v' ], timeoutMs: 10000 }
[
{
command: 'node -v',
stdout: 'v25.2.1\n',
stderr: '',
outcome: { type: 'exit', exitCode: 0 }
},
{
command: 'npm -v',
stdout: '11.6.2\n',
stderr: '',
outcome: { type: 'exit', exitCode: 0 }
}
]
Node.js version: `v25.2.1`
(Also installed: npm `11.6.2`)
When using the Shell tool, you need to implement the Shell interface yourself.
Furthermore, when operating a local shell or files, it is advisable to include human approval to confirm whether the tool should be executed. To include human approval, set needsApproval to true in the shellTool object and implement the authorization process in the onApproval event.
For more details on human approval, please refer to the following documentation:
Python Sample for Shell Execution
The following sample did not work with the libraries installed as of February 3rd, 2026. This might work in a future release.
Examples of Integration with MCP
The Model Context Protocol (MCP) is a protocol for integrating LLMs with external tools. OpenAI Agents can integrate with external tools using MCP.
Here, we will look at a sample that performs crawling in collaboration with a Playwright MCP server. In this sample, we will crawl the Zenn top page and list the text of the <h2> tags.
Sample of Playwright MCP Integration in TypeScript
Sample Code
import { Agent, run, MCPServerStdio } from "@openai/agents";
import dotenv from "dotenv";
dotenv.config();
async function main() {
// Start Playwright MCP (stdio) via npx
const playwrightMcp = new MCPServerStdio({
name: "playwright-mcp",
// -y avoids the npx confirmation prompt
// @playwright/mcp is Microsoft's Playwright MCP server
fullCommand: "npx -y @playwright/mcp@latest",
// Environment variables can be passed if necessary
// env: { ...process.env, PLAYWRIGHT_BROWSERS_PATH: "0" },
cacheToolsList: true,
});
await playwrightMcp.connect();
try {
const agent = new Agent({
name: "zenn-h2-lister",
// Choose model appropriately. toolChoice should be 'required' as tool calling is needed.
model: "gpt-5-nano",
modelSettings: { toolChoice: "required" },
mcpServers: [playwrightMcp],
instructions: [
"You are a browser automation agent.",
"Use only Playwright MCP.",
"Task: ",
" 1. Open the URL entered by the user in a browser.",
" 2. List the text of all <h2> tags in the page.",
" If an <h2> tag is empty, ignore it.",
" 3. Return the result as a JSON array of strings. No comments are required.",
"Be robust: ",
" - Wait for the page to finish loading before extraction.",
].join("\n"),
});
const result = await run(agent, "https://zenn.dev/");
console.log(result.finalOutput);
console.dir(result.history, { depth: null });
} finally {
await playwrightMcp.close();
}
}
main().catch((e) => {
console.error(e);
process.exit(1);
});
Running this sample will launch a browser and perform crawling.
Sample of Playwright MCP Integration in Python
Since a Playwright MCP server for Python does not exist, we will use playwright-mcp installed via Node.js.
npx -y @playwright/mcp@latest
Sample Code
import asyncio
import os
from dotenv import load_dotenv
from agents import Agent, Runner
from agents.mcp import MCPServerStdio
from agents.model_settings import ModelSettings
load_dotenv()
INSTRUCTIONS = """
You are a browser automation agent.
Use only Playwright MCP.
Task:
1. Open the URL entered by the user in a browser.
2. List the text of all <h2> tags in the page.
If an <h2> tag is empty, ignore it.
3. Return the result as a JSON array of strings. No comments are required.
Be robust:
- Wait for the page to finish loading before extraction.
"""
async def main() -> None:
# Start Playwright MCP (stdio) via npx
# TS: fullCommand: "npx -y @playwright/mcp@latest"
async with MCPServerStdio(
name="playwright-mcp",
params={
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"],
# Environment variables can be passed if necessary (Equivalent to TS env: { ...process.env, ... })
"env": dict(os.environ),
# "cwd": "/path/to/working/dir", # If necessary
},
cache_tools_list=True,
client_session_timeout_seconds=60,
max_retry_attempts=2,
retry_backoff_seconds_base=1.0,
) as server:
agent = Agent(
name="zenn-h2-lister",
model="gpt-5-nano",
model_settings=ModelSettings(tool_choice="required"),
mcp_servers=[server],
instructions=INSTRUCTIONS,
)
url = "https://zenn.dev/"
result = await Runner.run(agent, url)
# TS: console.log(result.finalOutput)
# Python: result.final_output
print(result.final_output)
if __name__ == "__main__":
asyncio.run(main())
Since MCPServerStdio did not support synchronous context managers (with MCPServerStdio()), we implemented it using asynchronous processing this time. The basic flow is similar to the TypeScript implementation. However, in the case of Python, the default setting resulted in a timeout, so we adjusted the MCP server's session timeout using client_session_timeout_seconds.
Examples of Guardrails
By using guardrails, you can perform checks and validations on user inputs and agent outputs. Guardrails can be configured for both Agents and tools. In the following example, we will add the following guardrails:
- Inputs
aandbfor the addition tool must be less than 10. - The output of the addition tool must be a numeric value less than 10.
- The input to the agent must be related to arithmetic calculations.
Sample of Guardrails in TypeScript
Sample Code
import {
Agent,
run,
tool,
InputGuardrail,
ToolGuardrailFunctionOutputFactory,
defineToolInputGuardrail,
defineToolOutputGuardrail
} from "@openai/agents";
import { z } from 'zod';
import dotenv from "dotenv";
dotenv.config();
const addParameterType = z.object({
a: z.number(),
b: z.number()
});
const guardrailAgent = new Agent({
name: 'Guardrail check',
instructions: 'Confirm if this question is asking about arithmetic calculations.',
outputType: z.object({
isMathHomework: z.boolean(),
reasoning: z.string(),
}),
});
const mathGuardrail: InputGuardrail = {
name: 'Math Homework Guardrail',
runInParallel: false,
execute: async ({ input, context }) => {
const result = await run(guardrailAgent, input, { context });
return {
outputInfo: result.finalOutput,
tripwireTriggered: result.finalOutput?.isMathHomework === false,
};
},
};
// Input Guardrail: Reject if a or b is 10 or greater
const inputValueGuardrailBlock = defineToolInputGuardrail({
name: 'inputValueGuardrailBlock',
run: async ({ toolCall }) => {
console.log('inputValueGuardrailBlock called with:', toolCall);
const args = JSON.parse(toolCall.arguments);
if (args.a >= 10 || args.b >= 10) {
return ToolGuardrailFunctionOutputFactory.rejectContent(
`${args.a} or ${args.b} is 10 or greater.`,
);
}
return ToolGuardrailFunctionOutputFactory.allow();
},
});
// Output Guardrail: Reject if the result is not a number
function isNumericStrict(value: string): boolean {
if (value.trim() === "") return false;
const n = Number(value);
return Number.isFinite(n);
}
const outputValueGuardrailBlock = defineToolOutputGuardrail({
name: 'outputValueGuardrailBlock',
run: async ({ output }) => {
console.log('outputValueGuardrailBlock called with:', output);
const text = String(output ?? '');
if (isNumericStrict(text)) {
const v = Number(text);
if (v < 10) {
return ToolGuardrailFunctionOutputFactory.allow();
} else {
return ToolGuardrailFunctionOutputFactory.rejectContent(
'Output is too large.',
);
}
} else {
return ToolGuardrailFunctionOutputFactory.rejectContent(
'Output is not numeric.',
);
}
},
});
const addTool = tool({
name: "add",
description: "Add two numbers",
parameters: addParameterType,
inputGuardrails: [inputValueGuardrailBlock],
outputGuardrails: [outputValueGuardrailBlock],
async execute({ a, b }) {
console.log('add tool called with:', { a, b });
return a + b;
}
});
const agent = new Agent({
name: "calculator-agent",
instructions: "You are a calculator. Calculate using the appropriate tool. If results from the tool cannot be obtained, state the reason and answer concisely that it is unanswerable. Never calculate the answer yourself.",
tools: [addTool],
modelSettings: { toolChoice: 'required' }, // auto: default or required: always call tool or none: don't use tool or tool name
model: "gpt-5-nano",
inputGuardrails: [mathGuardrail],
});
async function main() {
{
// Normal
const result = await run(agent, "What is 1 plus 3?" );
console.log(result.finalOutput);
// console.dir(result.history, { depth: null });
}
{
// Rejected by tool input guardrail
const result = await run(agent, "What is 1 plus 30?" );
console.log(result.finalOutput);
// console.dir(result.history, { depth: null });
}
{
// Rejected by tool output guardrail
const result = await run(agent, "What is 1 plus 9?" );
console.log(result.finalOutput);
// console.dir(result.history, { depth: null });
}
{
// Rejected by agent input guardrail. Throws an exception
const result = await run(agent, "How is the weather today?" );
console.log(result.finalOutput);
// console.dir(result.history, { depth: null });
}
}
main().catch(console.error);
Execution Result
inputValueGuardrailBlock called with: {
type: 'function_call',
id: 'fc_0532b9f8f52d1d9f006982ba16f5bc81a3a6c09eccf2848bb3',
callId: 'call_pOETSrZ3rZPMCkddfgLEeZfz',
name: 'add',
status: 'completed',
arguments: '{"a":1,"b":3}',
providerData: {
id: 'fc_0532b9f8f52d1d9f006982ba16f5bc81a3a6c09eccf2848bb3',
type: 'function_call'
}
}
add tool called with: { a: 1, b: 3 }
outputValueGuardrailBlock called with: 4
It is 4.
inputValueGuardrailBlock called with: {
type: 'function_call',
id: 'fc_0236102adf9f8aa2006982ba1ddd0481978b2e7d101050a0b1',
callId: 'call_jK9ORfCSCcipwzUhQGmWXNxb',
name: 'add',
status: 'completed',
arguments: '{"a":1,"b":30}',
providerData: {
id: 'fc_0236102adf9f8aa2006982ba1ddd0481978b2e7d101050a0b1',
type: 'function_call'
}
}
inputValueGuardrailBlock called with: {
type: 'function_call',
id: 'fc_0236102adf9f8aa2006982ba21421081978cb76d742539b334',
callId: 'call_LN4KpaZVdcvjpn4lJbUnheDX',
name: 'add',
status: 'completed',
arguments: '{"a":1,"b":30}',
providerData: {
id: 'fc_0236102adf9f8aa2006982ba21421081978cb76d742539b334',
type: 'function_call'
}
}
The tool did not return a number and a calculation result cannot be provided. It is unanswerable.
inputValueGuardrailBlock called with: {
type: 'function_call',
id: 'fc_023a3fb271aa0b56006982ba2f0450819d97796e003484fb2d',
callId: 'call_c8cfBu4Ir6BFj0GzeIG3C87U',
name: 'add',
status: 'completed',
arguments: '{"a":1,"b":9}',
providerData: {
id: 'fc_023a3fb271aa0b56006982ba2f0450819d97796e003484fb2d',
type: 'function_call'
}
}
add tool called with: { a: 1, b: 9 }
outputValueGuardrailBlock called with: 10
inputValueGuardrailBlock called with: {
type: 'function_call',
id: 'fc_023a3fb271aa0b56006982ba326074819d8ceccece33f718a5',
callId: 'call_XX9hWeBQcNbfIDWIGCGeyXOh',
name: 'add',
status: 'completed',
arguments: '{"a":1,"b":9}',
providerData: {
id: 'fc_023a3fb271aa0b56006982ba326074819d8ceccece33f718a5',
type: 'function_call'
}
}
add tool called with: { a: 1, b: 9 }
outputValueGuardrailBlock called with: 10
I'm sorry. The result from the tool could not be obtained. Reason: The output is too large error.
InputGuardrailTripwireTriggered: Input guardrail triggered: {"isMathHomework":false,"reasoning":"The question 'How is the weather today?' is asking about today's weather, which does not involve arithmetic calculation."}
In this example's execution results, the first question results in 4, the second is unanswerable due to the tool's input guardrail, the third is unanswerable due to the tool's output guardrail, and the fourth results in an exception due to the agent's input guardrail.
The outcome of a guardrail violation differs depending on whether it is a tool guardrail or an agent guardrail. A tool guardrail violation does not raise an exception, and the process continues. Depending on the prompt, even if there is a tool error, the LLM might provide an answer through other means, such as calculating the result of the addition itself. On the other hand, a violation of an agent's guardrail results in an immediate exception.
Guardrail Sample with Python
Sample Code
from agents import Agent, Runner, function_tool, ModelSettings
from agents import (
input_guardrail,
tool_input_guardrail,
tool_output_guardrail,
GuardrailFunctionOutput,
ToolGuardrailFunctionOutput,
InputGuardrailTripwireTriggered,
)
from pydantic import BaseModel
import dotenv
import os
import json
dotenv.load_dotenv()
class AddInput(BaseModel):
a: int
b: int
# --- (1) Agent Input Guardrail (Equivalent to TS: mathGuardrail) ---
class MathGuardrailOut(BaseModel):
isMathHomework: bool
reasoning: str
guardrail_agent = Agent(
name="Guardrail check",
instructions="Confirm if this question is asking about arithmetic calculations.",
output_type=MathGuardrailOut,
model="gpt-5-nano",
)
@input_guardrail(run_in_parallel=False)
async def math_guardrail(ctx, agent: Agent, user_input: str):
print("math_guardrail:", user_input)
r = await Runner.run(guardrail_agent, input=user_input, context=ctx.context)
out: MathGuardrailOut = r.final_output
print("math_guardrail output:", out)
# Same as TS: trigger tripwire if not arithmetic
return GuardrailFunctionOutput(
output_info=out,
tripwire_triggered=(out.isMathHomework is False),
)
# --- (2) Tool Input Guardrail (Reject if a or b >= 10) ---
@tool_input_guardrail
def input_value_guardrail_block(data):
payload = json.loads(data.context.tool_arguments or "{}")
print('input_value_guardrail_block:', payload)
args = payload.get("args", {})
a = args.get("a")
b = args.get("b")
# Just in case
if a is None or b is None:
return ToolGuardrailFunctionOutput.reject_content("Missing a or b.")
print("a:", a, "b:", b)
if a >= 10 or b >= 10:
print(f"input_value_guardrail_block reject: a={a}, b={b}")
return ToolGuardrailFunctionOutput.reject_content(f"{a} or {b} is 10 or greater.")
return ToolGuardrailFunctionOutput.allow()
# --- (3) Tool Output Guardrail (Reject if not numeric or not <10) ---
def is_numeric_strict(text: str) -> bool:
if text.strip() == "":
return False
try:
v = float(text)
except ValueError:
return False
return v not in (float("inf"), float("-inf"))
@tool_output_guardrail
def output_value_guardrail_block(data):
print("output_value_guardrail_block:", data.output)
text = str(data.output if data.output is not None else "")
if is_numeric_strict(text):
v = float(text)
if v < 10:
return ToolGuardrailFunctionOutput.allow()
return ToolGuardrailFunctionOutput.reject_content("Output is too large.")
return ToolGuardrailFunctionOutput.reject_content("Output is not numeric.")
# ---- From here down, keep the original add_tool / agent / run_sync as much as possible ----
@function_tool(
tool_input_guardrails=[input_value_guardrail_block],
tool_output_guardrails=[output_value_guardrail_block],
)
def add_tool(args: AddInput) -> int:
"""Add two numbers
Args:
args: AddInput class
"""
print("add_tool called with:", args)
return args.a + args.b
agent = Agent(
name="calculator-agent",
instructions="You are a calculator. Calculate using the appropriate tool. If results from the tool cannot be obtained, state the reason and answer concisely that it is unanswerable. Never calculate the answer yourself.",
tools=[add_tool],
model="gpt-5-nano",
model_settings=ModelSettings(tool_choice="required"),
input_guardrails=[math_guardrail], # ★ Added here
)
def run_case(text: str):
try:
result = Runner.run_sync(agent, input=text)
print("input:", text)
print("output:", result.final_output)
except InputGuardrailTripwireTriggered:
print("input:", text)
print("output: Rejected by input guardrail.")
# Verification (Equivalent to the 4 cases in the TS main function)
run_case("What is 1 and 3 added?") # Normal
run_case("What is 1 and 30 added?") # Rejected by tool input guardrail
run_case("What is 1 and 9 added?") # Rejected by tool output guardrail (10 is >=10)
run_case("How is the weather today?") # Rejected by agent input guardrail
Execution Result
% uv run agent_sample04b.py
math_guardrail: What is 1 and 3 added?
math_guardrail output: isMathHomework=True reasoning='Yes. This question is about arithmetic calculation (addition). It is a basic math problem of adding 1 and 3.'
input_value_guardrail_block: {'args': {'a': 1, 'b': 3}}
a: 1 b: 3
add_tool called with: a=1 b=3
output_value_guardrail_block: 4
input: What is 1 and 3 added?
output: 4
math_guardrail: What is 1 and 30 added?
math_guardrail output: isMathHomework=True reasoning='This is an arithmetic calculation question. 1 plus 30 equals 31.'
input_value_guardrail_block: {'args': {'a': 1, 'b': 30}}
a: 1 b: 30
input_value_guardrail_block reject: a=1, b=30
input_value_guardrail_block: {'args': {'a': 1, 'b': 30}}
a: 1 b: 30
input_value_guardrail_block reject: a=1, b=30
input: What is 1 and 30 added?
output: A result could not be obtained from the calculation tool. Since the returned value is a string that looks like an error message rather than a number, I cannot provide the answer. Would you like to try again?
math_guardrail: What is 1 and 9 added?
math_guardrail output: isMathHomework=True reasoning='Yes, this is an arithmetic calculation. 1 plus 9 is 10.'
input_value_guardrail_block: {'args': {'a': 1, 'b': 9}}
a: 1 b: 9
add_tool called with: a=1 b=9
output_value_guardrail_block: 10
input_value_guardrail_block: {'args': {'a': 1, 'b': 9}}
a: 1 b: 9
add_tool called with: a=1 b=9
output_value_guardrail_block: 10
input: What is 1 and 9 added?
output: The result from the tool could not be obtained. Reason: The output is too large.
math_guardrail: How is the weather today?
math_guardrail output: isMathHomework=False reasoning='This question asks about the weather and is not related to arithmetic calculation (math calculation).'
input: How is the weather today?
output: Rejected by input guardrail
In Python, you can implement guardrails just as in TypeScript.
Note that when an error occurred while using run_sync for a function called from within a process already running under run_sync, I implemented the agent's guardrail to execute asynchronously using run.
Looking at the execution history, it appears that retries are triggered in cases where ToolGuardrailFunctionOutput.reject_content is returned by the tool guardrail. If you want to stop the process immediately, you might want to consider using raise_exception instead.
Summary
In this article, we explored using the OpenAI Agent SDK to implement agents with OpenAI.
While previously we had to manually coordinate the LLM with functions and external services when just using the API, using agents seems to lower those hurdles significantly.
Additionally, in Python, agents can be represented as graphs, and combined with the execution trace functionality mentioned earlier, it is not too difficult to verify what kind of processing is being performed.
On the other hand, since the LLM primarily determines which functions to coordinate with, there is a possibility of non-deterministic behavior. For instance, the LLM might decide to answer something on its own that you intended for a custom function to handle. Therefore, it will be necessary to consider monitoring mechanisms, such as logs, during production. Furthermore, LLM usage costs may tend to be higher than when executing the API as needed.
While there are some features we couldn't introduce this time, you can likely get a good idea of how to implement them by looking at the official samples.
References
- OpenAI Agent SDK Document
- openai-agents-js Code
- openai-agents-js Documentation
- openai-agents-python Code
- openai-agents-python Documentation
- zod
- playwright mcp
- What is the Model Context Protocol (MCP)?
-
to_statewas added at the end of January 2026. Once this is released, it may be possible to use it similarly to TypeScript. Reference ↩︎ -
This is just an example. Obtaining municipality populations via LLM is prone to hallucinations, and populations vary by measurement year and are not fixed. ↩︎
-
This content is described in the Recommended Prompts.
RECOMMENDED_PROMPT_PREFIXcontains the following content:"# System context\nYou are part of a multi-agent system called the Agents SDK, designed to make agent coordination and execution easy. Agents uses two primary abstractions: **Agents** and **Handoffs**. An agent encompasses instructions and tools and can hand off a conversation to another agent when appropriate. Handoffs are achieved by calling a handoff function, generally named ``transfer_to_<agent_name>``. Transfers between agents are handled seamlessly in the background; do not mention or draw attention to these transfers in your conversation with the user."↩︎
Discussion