iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🎉

Getting Started with OpenAI Realtime API Embedded SDK - Part 2

に公開

Note: Notice of SDK Discontinuation (2025.04.01)
https://github.com/openai/openai-realtime-embedded

Previous Article
https://zenn.dev/goroman/articles/c5f1013d81f756

How to Install the OpenAI Realtime Embedded SDK

The OpenAI Realtime Embedded SDK is an SDK for embedded devices running on ESP32-S3 or Linux environments. While it is named an "SDK," it is closer to a sample implementation. For ESP32 development, Arduino IDE and PlatformIO are commonly used; however, this SDK is built with the assumption of using ESP-IDF, which is Espressif's official development framework.

What is ESP-IDF?
ESP-IDF (Espressif IoT Development Framework) is a development environment provided by Espressif. It is a development framework for ESP32 series microcontrollers, supporting development in C/C++.

Additionally, the OpenAI Realtime Embedded SDK uses WebRTC for communication with the OpenAI Realtime API.

What is WebRTC?
WebRTC (Web Real-Time Communication) is an open-source communication technology that enables voice and video communication.

Structure

The OpenAI Realtime Embedded SDK is organized as follows:

.
├── src/                      # Main source code
│   ├── main.cpp              # Entry point
│   ├── media.cpp             # Media processing
│   ├── webrtc.cpp            # WebRTC implementation
│   └── wifi.cpp              # Wi-Fi connection processing

└── deps/                     # Dependency libraries
    └── libpeer/              # Core library for WebRTC implementation

Major Dependencies

  • ESP-IDF - Espressif IoT Development Framework

    • Version: >= 4.1.0
    • Development framework for ESP32 series microcontrollers
  • libpeer (Included)

    • Core library for WebRTC implementation
    • Implements protocol stacks such as ICE, DTLS-SRTP, and STUN

If ESP-IDF is not yet installed, refer to articles like the following to install it.

ESP-IDF Installation Steps

Supported Platforms/Devices

This SDK is primarily developed and tested targeting the Espressif esp32s3.

Operation has been officially confirmed on the following microcontroller boards:

Note: Microcontrollers equipped with ESP32-S3, such as Seeed Studio XIAO ESP32-S3 or M5Stack ATOMS3R, are recommended.

For voice calls, you will need to separately connect an I2S-compatible microphone and speaker. It has been confirmed that PDM-type microphones can also work by modifying the source code. The M5Stack Atomic Echo Base, which comes with an integrated microphone and speaker, seems like a good option.

What is I2S?
I2S (Inter-IC Sound) is a synchronous serial communication standard for transferring digital audio data. It uses three signal lines—clock signal, word select (L/R), and data signal—to achieve high-quality audio transfer. On the ESP32-S3, you can use up to two I2S interfaces, supporting up to 32-bit resolution and up to 96kHz sampling rates.

What is PDM?
PDM (Pulse Density Modulation) is a method for converting analog signals into digital signals. It represents analog values with a 1-bit digital signal, where the amplitude is represented by the density of the signal (number of pulses). Compared to conventional PCM methods, the circuitry is simpler and more resistant to noise. On the ESP32-S3, input from a PDM microphone can be received via I2S and processed internally by converting it to PCM.

Installation Steps

The following steps are for an environment where ESP-IDF is already installed.

  1. Clone from GitHub

    git clone --recursive https://github.com/openai/openai-realtime-embedded-sdk.git
    
  2. Setting the target platform

    • Currently, only linux and esp32s3 are supported.
    • Set the target with the following command:
    idf.py set-target esp32s3
    
  3. Device-specific settings

    • No special settings are required at this time.
    idf.py menuconfig
    
  4. Setting environment variables

    • Set your Wi-Fi SSID, password, and OpenAI API key:
    export WIFI_SSID=your_wifi_ssid
    export WIFI_PASSWORD=your_wifi_password
    export OPENAI_API_KEY=your_openai_api_key
    

    Be careful not to let this information leak externally.

  5. Build

    idf.py build
    
  6. Flashing to the device or executing

    For ESP32-S3:

    idf.py flash
    

In the next article, I will explain how to actually run it.

GitHubで編集を提案

Discussion