iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧑‍🔧

Getting Started with the Deep Research API! Advanced Tips for Deeper Research

に公開

OpenAI has released the long-awaited Deep Research API. For users of ChatGPT Plus/Pro, the powerful "Deep Research" feature that investigates and summarizes complex topics is a familiar one. While it was previously only available on ChatGPT, it has finally been released as an API, allowing you to integrate it into your own applications and workflows.

In this article, I will explain everything from the basic usage of the API to practical techniques for obtaining better results, complete with code examples, based on the official documentation.

https://platform.openai.com/docs/guides/deep-research

How to Call

This is the simplest way to call the API.
At the time of writing, you can choose between o3-deep-research or o4-mini-deep-research as the model.
The report topic is the currently trending: "Please search the web and tell me about the differences between Gemini CLI and Claude code."

from openai import OpenAI
client = OpenAI()

input_text = """
Please search the web and tell me about the differences between Gemini CLI and Claude code.
"""

response = client.responses.create(
  model="o4-mini-deep-research",
  input=input_text,
  tools=[
    {"type": "web_search_preview"},
  ],
)

print(response.output_text)

However, detailed research such as Deep Research can take several minutes, which might lead to timeouts with standard API requests. Therefore, OpenAI recommends using background mode to execute time-consuming tasks asynchronously and reliably.

Background Mode

https://platform.openai.com/docs/guides/background

You just need to add stream and background to the parameters.

from openai import OpenAI
client = OpenAI()

input_text = """
Please search the web and tell me about the differences between Gemini CLI and Claude code.
"""

stream = client.responses.create(
  model="o4-mini-deep-research",
  input=input_text,
  tools=[
    {"type": "web_search_preview"},
  ],
  stream=True,
  background=True,
)

cursor = None
for event in stream:
  print(event)
  cursor = event.sequence_number

I executed the code above, and it took about 20 minutes and cost $4... it's quite expensive! 😂
Here is the report.

Report (1st time)

Report

Overview

Both Gemini CLI and Claude Code are tools that leverage large language models (LLMs), but there are significant differences in their providers, usage methods, and applications. In a nutshell, Gemini CLI is a command-line interface for the Gemini models provided by Google, while Claude Code has the character of a code generation/analysis environment using the Claude models provided by Anthropic.

Provider and Model

Gemini CLI: A tool for handling Gemini-based language models (e.g., Gemini Pro/Ultra) developed by Google (Google AI and DeepMind) via the command line. It allows the use of Gemini models through the Google Cloud generative AI platform.
Claude Code: This can be considered a function/environment specialized for programming and code-related tasks using the "Claude" language model developed by Anthropic (founded by former OpenAI executives). Claude itself is a conversational AI, characterized by a design that emphasizes safety and cooperation with humans (the code version specifically focuses on "generating code safely").

Access Method and Usage Form

Gemini CLI: As the name suggests, it is a CLI (Command-Line Interface) tool, so you execute commands in the terminal to query the Gemini model. It operates through commands built into the Google Cloud SDK (gcloud CLI) or unique CLI tools. You set up an API key or Google Cloud authentication information to run it, and input/output is handled via standard I/O (console) or files.
Claude Code: Anthropic Claude is primarily provided through a user-friendly web interface (chat format) or a paid API. As the name "Claude Code" suggests, it is thought to be a mode or service strong in developer-oriented tasks such as code generation and code review. Access is gained through Anthropic's platform (Web UI or API) by entering dialogues or prompts. It is generally used via GUI or API rather than as a CLI tool.

Main Uses and Functional Differences

Gemini CLI: Gemini models support general conversation and text generation, with no restrictions to specific tasks. The CLI version also basically responds to text prompts, so it can be used multi-purposely for translation, summarization, brainstorming, and simple code generation. It is easy to integrate with Google Cloud and can be used as part of automation scripts or workflows combined with other cloud services.
Claude Code: While Claude itself is a general conversational AI, the "Code" suffix implies functions specialized for "code generation, analysis, and modification." For example, it writes program code for given problems or proposes bug fixes and efficiency improvements for existing code. Anthropic also emphasizes the safety and ethical aspects of Claude, with measures in place to suppress the generation of incorrect code or dangerous operations. Therefore, it is suitable for programming education and assisting in automated code reviews of large-scale systems.

Technical and Operational Differences

Gemini CLI (Google): It depends on the Google Cloud environment, requiring a Google Cloud account and billing setup for use. Since it's a CLI operation, it's easy to integrate into scripts and CI/CD, and integrates well with other GCP services (such as BigQuery or Cloud Storage). The Gemini model itself is very large-scale and said to perform highly across diverse tasks (for example, handling long contexts of tens of thousands of tokens), but it is not necessarily dedicated to code.
Claude Code (Anthropic): Provided on a platform hosted by Anthropic, it requires registration with Anthropic and an API key. Its use centers on interactions in the Web UI or API calls rather than CLI-like integration. While Claude is famous for its safety design, users must verify the catch-up of generation quality (such as delays relative to the latest trends). Due to Anthropic's business model, commercial use or integration into commercial apps also involves specific contracts.

Summary

Gemini CLI is a tool for using Google's Gemini models from the command line. Through Google Cloud's Generative AI features, it can execute a wide range of language tasks in a CLI environment.
Claude Code can be viewed as a feature for code generation and analysis using Anthropic's Claude models. It uses Anthropic's safety-oriented conversational AI platform, specifically for code-related tasks.
Simply put, the two differ in terms of the model provider (Google vs. Anthropic) and usage form (CLI tool vs. Web/API). In terms of application, Gemini CLI is a general-purpose language processing tool, while Claude Code is a tool more specialized for coding assistance.
Because of these differences, usability, integrable services, and licensing forms also differ greatly. For example, if you are already developing in a Google Cloud environment, Gemini CLI is a natural choice, whereas if you seek safety-oriented code assistance pushed by Anthropic, Claude Code becomes an option.


Impressions

The generated report feels a bit vague compared to what I usually use in ChatGPT. $4 might be a bit expensive for this... 🤨
The cause of this lies in simply passing a basic question prompt to the API. To unleash the true potential of the Deep Research API, you need to incorporate more detailed instructions into the prompt beforehand!

For Deeper Deep Research

Deepening the Questions

Those who have used Deep Research in ChatGPT may have noticed that follow-up questions are requested after sending a query. ChatGPT's detailed research follows a three-step process:

  1. Clarification: An intermediate model (such as gpt-4.1) clarifies the user's intent and gathers more context (preferences, goals, constraints, etc.) before starting the research process.
  2. Prompt Rewriting: The intermediate model receives the original user input and explanation to generate a more detailed prompt.
  3. Deep Research: The detailed and expanded prompt is passed to the deep research model, and the research is conducted.

However, detailed investigations via the Responses API do not include steps for clarification or prompt rewriting. As a developer, you need to set up these processing steps to rewrite user prompts or ask a series of clarification questions.

Implementation Example of Clarification

The following is the official OpenAI prompt translated into English. The question is the same as before: the differences between Gemini CLI and Claude code.

from openai import OpenAI
client = OpenAI()

instructions = """
You are conversing with a user who has requested a research task.  
Your job is to elicit additional information from the user that is necessary to successfully carry out that task.

Guidelines:
- Gather all necessary information while remaining **concise**  
- Collect information required for the research task in a clear and organized manner  
- Use bullet points or numbered lists appropriately for clarity  
- Do not ask for unnecessary information or information the user has already provided  

Important: **Do not perform the research yourself**; focus solely on gathering information for a researcher to perform the research later.
"""

input_text = "Please search the web and tell me about the differences between Gemini CLI and Claude code."

response = client.responses.create(
  model="gpt-4.1",
  input=input_text,
  instructions=instructions,
)

print(response.output_text)

The following follow-up questions were received. They are just like the follow-up questions you get when using Deep Research in ChatGPT!

To conduct that research, the following information would be useful:

1. **Points of connection or comparison between Gemini CLI and Claude code**  
   - Which specific aspects are you interested in? (e.g., features, ease of use, performance, etc.)

2. **Use case or context**  
   - How do you plan to use each? (e.g., is there a specific project or purpose?)

3. **Specific questions or areas of interest**
   - What information do you already have, and what further details are you expecting?

Providing this information will help the researcher provide more accurate information.

Implementation Example of Prompt Rewriting

As the next step, we rewrite the prompt for the researcher. The following is also the official OpenAI prompt translated into English.

from openai import OpenAI
client = OpenAI()

instructions = """
You receive a research task from a user. Your job is to create instructions for a researcher to complete that task. **Do not execute the task yourself**. Simply present the steps to complete the task.

GUIDELINES:
1. **Maximize specificity and detail**
   - Include all user preferences and conditions, and explicitly list key attributes or perspectives to be considered  
   - It is most important to reflect information from the user into the instructions without omission

2. **Treat items not indicated by the user but necessary as \"unspecified\"**
   - If attributes essential for a meaningful output are not specified by the user, clearly state them as \"unspecified (open-ended)\" or with no specific constraints

3. **Avoid unfounded assumptions**
   - Do not make up details that the user has not provided  
   - If there is no specification, state so clearly and instruct the researcher to respond flexibly

4. **Write in the first person**
   - Describe the request from the user's perspective

5. **When using tables**
   - If you determine that a table would be useful for organizing or visualizing information, explicitly ask the researcher to present it in table format  
     Examples:  
     - Product comparison (for consumers): A comparison table listing features, prices, and ratings for each smartphone  
     - Project management (business): A table listing tasks, deadlines, owners, and progress  
     - Budget planning (for consumers): A table summarizing income sources, monthly expenses, and savings goals  
     - Competitor analysis (business): A table showing market share, pricing, and key differentiators

6. **Headings and Formatting**
   - Specify the expected output format  
   - If a structured format like a report or plan is desirable, specify a report format with appropriate headings

7. **Language**
   - If the user input is in a language other than English, instruct the researcher to respond in that language unless the user has requested a different language

8. **Sources**
   - Explicitly state any preferred sources  
   - For product/travel research, recommend official websites or reliable e-commerce sites (e.g., Amazon reviews)  
   - In academic/scientific fields, prioritize original papers or official journals over review papers  
   - For inquiries in a specific language, emphasize sources published in that language
"""

input_text = """Please search the web and tell me about the differences between Gemini CLI and Claude code.
1. **Points of connection or comparison between Gemini CLI and Claude code**  
   - Which specific aspects are you interested in? (e.g., features, ease of use, performance, etc.)
⇨Features, performance

2. **Use case or context**  
   - How do you plan to use each? (e.g., is there a specific project or purpose?)
⇨General programming

3. **Specific questions or areas of interest**
   - What information do you already have, and what further details are you expecting?
⇨Expecting a general-purpose comparison"""

response = client.responses.create(
    model="gpt-4.1",
    input=input_text,
    instructions=instructions,
)

print(response.output_text)

input_text contains the initial instruction and the answers to the follow-up questions. I had the prompt for the researcher created as follows.

Here are the steps to complete the task.

### Steps

1. **Confirm Objective**
   - Compare the features and characteristics of Gemini CLI and Claude Code and clarify their differences.

2. **Keyword Setting**
   - Search using keywords such as \"Gemini CLI features,\" \"Claude Code features,\" and \"Gemini CLI vs Claude Code.\"

3. **Visit Official Websites**
   - Check product information and documentation on each official website.
   - Also refer to company introduction pages and official blogs of the developers.

4. **Investigation of Reviews and Forums**
   - Visit technical forums and review sites (e.g., GitHub, Reddit, Stack Overflow) to gather user opinions and experiences.
   - Specifically refer to the latest forums and active discussions.

5. **Creation of Feature Comparison Table**
   - Organize the information and create a comparison table based on the following perspectives:
     - Basic features
     - User interface
     - Compatibility
     - Supported programming languages
     - Use cases
     - Pricing (Free/Paid)
     - Ecosystem

6. **Report Creation**
   - Summarize the research findings in text aligned with the task's objective.
   - Create a structured report format including sections for \"Overview,\" \"Detailed Comparison,\" and \"Conclusion.\"

7. **Recording Sources**
   - List the main sources used for the research at the end of the document.

### Important Notes

- Prioritize reliable sources.
- If search results do not match or information is insufficient, delve deeper into the investigation.
- Ensure the research results are up-to-date.

Please proceed with the research according to the steps above.

Integrated Implementation Example

Here is the final code to integrate the flow so far and execute Deep Research. This code reproduces the three steps similar to ChatGPT (ClarificationPrompt RewritingDeep Research). It aims to obtain more targeted research results by first clarifying the research requirements with "follow-up questions" to the user and then generating detailed instructions for the researcher (AI) based on those answers.

from openai import OpenAI
client = OpenAI()

instructions = """
You are conversing with a user who has requested a research task.  
Your job is to elicit additional information from the user that is necessary to successfully carry out that task.

Guidelines:
- Gather all necessary information while remaining **concise**  
- Collect information required for the research task in a clear and organized manner  
- Use bullet points or numbered lists appropriately for clarity  
- Do not ask for unnecessary information or information the user has already provided  

Important: **Do not perform the research yourself**; focus solely on gathering information for a researcher to perform the research later.
"""

input_text = input("Enter your question: ")

follow_up_question = client.responses.create(
  model="gpt-4.1",
  input=input_text,
  instructions=instructions,
)

print("Follow-up question:")
print(follow_up_question.output_text)

follow_up_answer = input("Enter your answer: ")

instructions = """
You receive a research task from a user. Your job is to create instructions for a researcher to complete that task. **Do not execute the task yourself**. Simply present the steps to complete the task.

GUIDELINES:
1. **Maximize specificity and detail**
   - Include all user preferences and conditions, and explicitly list key attributes or perspectives to be considered  
   - It is most important to reflect information from the user into the instructions without omission

2. **Treat items not indicated by the user but necessary as \"unspecified\"**
   - If attributes essential for a meaningful output are not specified by the user, clearly state them as \"unspecified (open-ended)\" or with no specific constraints

3. **Avoid unfounded assumptions**
   - Do not make up details that the user has not provided  
   - If there is no specification, state so clearly and instruct the researcher to respond flexibly

4. **Write in the first person**
   - Describe the request from the user's perspective

5. **When using tables**
   - If you determine that a table would be useful for organizing or visualizing information, explicitly ask the researcher to present it in table format  
     Examples:  
     - Product comparison (consumer): A comparison table listing features, prices, and ratings for each smartphone  
     - Project management (business): A table listing tasks, deadlines, owners, and progress  
     - Budget planning (consumer): A table summarizing income sources, monthly expenses, and savings goals  
     - Competitor analysis (business): A table showing market share, pricing, and key differentiators

6. **Headings and Formatting**
   - Specify the expected output format  
   - If a structured format like a report or plan is desirable, specify a report format with appropriate headings

7. **Language**
   - If the user input is in a language other than English, instruct the researcher to respond in that language unless the user has requested a different language

8. **Sources**
   - Explicitly state any preferred sources  
   - For product/travel research, recommend official websites or reliable e-commerce sites (e.g., Amazon reviews)  
   - In academic/scientific fields, prioritize original papers or official journals over review papers  
   - For inquiries in a specific language, emphasize sources published in that language
"""

prompt_to_reseacher = client.responses.create(
    model="gpt-4.1",
    input=input_text+follow_up_answer,
    instructions=instructions,
)

print("Instructions to the researcher:")
print(prompt_to_reseacher.output_text)

stream = client.responses.create(
  model="o4-mini-deep-research",
  input=prompt_to_reseacher.output_text,
  tools=[
    {"type": "web_search_preview"},
  ],
  stream=True,
  background=True,
)

cursor = None
for event in stream:
  print(event)
  cursor = event.sequence_number

The result of running the above is as follows!

Report

Report (2nd time)

Comparison Table

Perspective Gemini CLI Claude Code Comments/Notes
Features - Open-source CLI-based AI agent powered by Google Gemini 1.5 Pro (cloud.google.com). Supports multi-purpose use such as coding, debugging, content generation, problem-solving, and task management (developers.google.com) (cloud.google.com).
- Enhances prompts with built-in tools like web search, shell commands, and file operations (cloud.google.com).
- Supports a large context window of 1 million tokens (blog.google).
- Agentic CLI tool powered by Anthropic Claude 3.7 Sonnet (github.com). Provides developer-oriented features such as file editing, bug fixes, test execution, code analysis, and Git operations (history search, merge, commit/PR creation) (github.com) (docs.anthropic.com).
- Allows natural language instructions in an interactive terminal (REPL).
Gemini CLI is a versatile agent for general tasks and is fully open-source (blog.google). Claude Code is specialized for coding tasks and is a self-contained CLI that doesn't require an interpreter, but it is closed-source (requires an API key).
Supported Languages - Leverages the multi-language support of Gemini models, expected to work with major programming languages (Python, Java, JavaScript, C#, C++, etc.; official list not released). - Claude models also support multiple languages (primarily English), allowing code operations in major languages. Uses a configuration file to specify project context. Both treat code as text and are assumed to handle many languages depending on the model (no specific language support list officially stated).
I/O Formats - Interactive text input/output (supports Markdown/code formatting). Supports streaming responses.
- Output results to the terminal in a simple format (rich display).
- Interactive text input/output (interactive REPL).
- Can output in JSON using --output-format json (docs.anthropic.com). Supports script calling and automation.
Gemini CLI focuses on colorful standard output, while Claude Code is characterized by structured output like JSON, making it easier to use in programs (docs.anthropic.com).
Extensibility - Open source (Apache-2.0), allowing community contributions (blog.google).
- Allows addition of plugins/tools via Model Context Protocol (MCP) or GEMINI.md (blog.google) (cloud.google.com).
- Although closed-source, extensibility options like SDKs for Node/Python and GitHub Action integration are available (docs.anthropic.com).
- MCP compliant, allowing remote server integration (via /mcp command with OAuth support) (docs.anthropic.com).
Gemini CLI can be flexibly extended with external tools/prompts, and the source code can be freely checked and modified (blog.google). Claude Code depends on Anthropic's officially supported SDKs and API integration features (docs.anthropic.com).
Performance - Model is Gemini 1.5 Pro. Fast processing of many requests with a large context window (=1M tokens) (blog.google). Free tier includes 60 requests/min and 1,000/day (blog.google).
- Latency involves waiting for cloud calls (several seconds). Local load is low.
- Model is Claude 3.7 Sonnet (Max plan) for high performance (docs.anthropic.com). Available on paid plans (Pro/Max). Responds in seconds due to cloud calls.
- Output quality tends to be high, but results can be unstable depending on the task (good for Python/C, reported failures in JavaScript (www.thoughtworks.com)).
Gemini CLI's free usage tier and large-capacity context are suitable for developers. Claude Code is paid but offers higher precision using top-tier models (some reports evaluate it higher than Gemini in benchmarks (medium.com)). Both depend on the cloud, so the local resource burden is small.

Report

Gemini CLI is an open-source CLI-based AI agent provided by Google, supporting multi-purpose use beyond code generation, such as research and content creation (cloud.google.com) (developers.google.com). It offers a large-capacity context (1M tokens) and a generous free quota (60 req/min) for individuals, and can integrate with tools like web search (blog.google) (cloud.google.com). On the other hand, Claude Code is a code-specific agent from Anthropic, optimized for development tasks like file editing, bug fixing, test execution, and Git operations (github.com) (docs.anthropic.com). While both operate on the terminal, Gemini CLI can be freely modified, whereas Claude Code depends on the Anthropic API (paid) and features a security-conscious design for enterprises (docs.anthropic.com) (blog.google). In terms of performance, Gemini leverages its free tier and massive context for versatility, while Claude excels in complex code tasks with high-precision models, though some reports indicate varying results depending on the task (www.thoughtworks.com) (medium.com). These are the main differences between the two tools.

References

Google official blogs and documentation (blog.google) (cloud.google.com), Anthropic official documentation (github.com) (docs.anthropic.com), and third-party technical articles (www.thoughtworks.com) (medium.com), etc.

Impressions

The quality of the second report has improved significantly compared to the first. While the first one was abstract and a bit hard to read as a long text, the second is organized in a comparison table and includes specific technical specifications and numerical data. This difference shows that prompt design is crucial for effectively utilizing the Deep Research API. By going through a two-step process of follow-up questions and detailed instructions, a report worth the $4 cost was generated. It was a result that once again reminded me that AI tools are truly "all about how you ask."

Conclusion

What did you think?
In this article, I introduced the OpenAI Deep Research API, from basic usage to practical techniques for obtaining higher-quality reports.
While you can get a decent answer just by throwing a simple question, I hope I've conveyed that the key to unlocking the true potential of Deep Research lies in the "quality of the prompt." As we tried this time, digging deeper into the user's intent and giving specific instructions on how to proceed with the research will dramatically improve the quality of the report.

Currently, the tool is only for web searches, but it seems that you can also use your own document search via MCP or a code interpreter! I haven't tried the MCP-related features yet, so I'd like to give them a go.

Although it costs a bit at the moment, being able to freely handle this powerful research function via API is very attractive. I hope you'll refer to this article and use the Deep Research API in various situations! Happy Coding🚀

Discussion