iTranslated by AI
Getting Started with OpenAI Realtime API Embedded SDK - Part 2
Note: Notice of SDK Discontinuation (2025.04.01)
Previous Article
How to Install the OpenAI Realtime Embedded SDK
The OpenAI Realtime Embedded SDK is an SDK for embedded devices running on ESP32-S3 or Linux environments. While it is named an "SDK," it is closer to a sample implementation. For ESP32 development, Arduino IDE and PlatformIO are commonly used; however, this SDK is built with the assumption of using ESP-IDF, which is Espressif's official development framework.
What is ESP-IDF?
ESP-IDF (Espressif IoT Development Framework) is a development environment provided by Espressif. It is a development framework for ESP32 series microcontrollers, supporting development in C/C++.
Additionally, the OpenAI Realtime Embedded SDK uses WebRTC for communication with the OpenAI Realtime API.
What is WebRTC?
WebRTC (Web Real-Time Communication) is an open-source communication technology that enables voice and video communication.
Structure
The OpenAI Realtime Embedded SDK is organized as follows:
.
├── src/ # Main source code
│ ├── main.cpp # Entry point
│ ├── media.cpp # Media processing
│ ├── webrtc.cpp # WebRTC implementation
│ └── wifi.cpp # Wi-Fi connection processing
│
└── deps/ # Dependency libraries
└── libpeer/ # Core library for WebRTC implementation
Major Dependencies
-
ESP-IDF - Espressif IoT Development Framework
- Version: >= 4.1.0
- Development framework for ESP32 series microcontrollers
-
libpeer (Included)
- Core library for WebRTC implementation
- Implements protocol stacks such as ICE, DTLS-SRTP, and STUN
If ESP-IDF is not yet installed, refer to articles like the following to install it.
Supported Platforms/Devices
This SDK is primarily developed and tested targeting the Espressif esp32s3.
Recommended Hardware
Operation has been officially confirmed on the following microcontroller boards:
Note: Microcontrollers equipped with ESP32-S3, such as Seeed Studio XIAO ESP32-S3 or M5Stack ATOMS3R, are recommended.
For voice calls, you will need to separately connect an I2S-compatible microphone and speaker. It has been confirmed that PDM-type microphones can also work by modifying the source code. The M5Stack Atomic Echo Base, which comes with an integrated microphone and speaker, seems like a good option.
What is I2S?
I2S (Inter-IC Sound) is a synchronous serial communication standard for transferring digital audio data. It uses three signal lines—clock signal, word select (L/R), and data signal—to achieve high-quality audio transfer. On the ESP32-S3, you can use up to two I2S interfaces, supporting up to 32-bit resolution and up to 96kHz sampling rates.
What is PDM?
PDM (Pulse Density Modulation) is a method for converting analog signals into digital signals. It represents analog values with a 1-bit digital signal, where the amplitude is represented by the density of the signal (number of pulses). Compared to conventional PCM methods, the circuitry is simpler and more resistant to noise. On the ESP32-S3, input from a PDM microphone can be received via I2S and processed internally by converting it to PCM.
Installation Steps
The following steps are for an environment where ESP-IDF is already installed.
-
Clone from GitHub
git clone --recursive https://github.com/openai/openai-realtime-embedded-sdk.git -
Setting the target platform
- Currently, only
linuxandesp32s3are supported. - Set the target with the following command:
idf.py set-target esp32s3 - Currently, only
-
Device-specific settings
- No special settings are required at this time.
idf.py menuconfig -
Setting environment variables
- Set your Wi-Fi SSID, password, and OpenAI API key:
export WIFI_SSID=your_wifi_ssid export WIFI_PASSWORD=your_wifi_password export OPENAI_API_KEY=your_openai_api_keyBe careful not to let this information leak externally.
-
Build
idf.py build -
Flashing to the device or executing
For ESP32-S3:
idf.py flash
In the next article, I will explain how to actually run it.
Discussion