iTranslated by AI
Building a Device to Display Spotify Cover Art on a Circular Display
What I Made
Earlier this year, I saw a circular display that can be powered by MagSafe on X and thought it looked cool. I bought it, but I was so busy that I left it untouched until I recently rediscovered it. I decided to turn it into a device that displays the cover art of whatever is currently playing on Spotify. Since simply displaying it would be boring, I took advantage of its circular shape to make it look like a record label and set it to rotate during playback. It's quite cute.
Development Environment
- macOS 26.1
- Cursor 2 (mostly Opus 4.5)
- arduino-cli
- ESP32 Core: 3.0.1 (The latest 3.3.3 has some issues with TLS, causing Spotify API calls to fail)
- ESP32_Display_Panel 0.1.4 (Must use the version included in the Demo ZIP downloadable from the product page above)
- ESP32_IO_Expander 0.0.2 (Same as above)
Spotify Integration
To retrieve Spotify's playback status and cover art images, you need to create your own Spotify app and use the Web API. Authentication is performed via OAuth 2.0 (PKCE). PKCE does not require a client secret, making it a secure authentication method even for public applications.
The flow from obtaining the token to using it is as follows:
- Client ID → Obtained from the Spotify Developer Dashboard (public info, no secret required)
- Refresh Token → Obtained via the PKCE authentication flow for the first time only (persistent)
- Access Token → Obtained from the Refresh Token (valid for 1 hour, automatically updated every 50 minutes)
- API Request → Call the Spotify API using the Access Token
1. Creating a Spotify App
Create an app on the Spotify Developer Dashboard and obtain a Client ID. Set the Redirect URI to http://127.0.0.1:8888/callback.
2. Obtaining a Refresh Token
Normally, for a web app, you would need to deploy an authentication page somewhere to authenticate and obtain a token, but since this time we are only writing the refresh token to the device, I decided to run a local server to obtain it there.
PKCE Flow:
- Generate
code_verifierandcode_challengeon the local server (token-retriever/). - Redirect to the Spotify authentication page in the browser (including the
code_challenge). - User logs in and approves permissions.
- An authorization code (
code) is returned. - Send
codeandcode_verifierto the Spotify API to exchange them for a token. - Automatically write the obtained Refresh Token into
config.h.
Since it was tedious to copy and paste, I set up the server to write the token directly into the source code once it was obtained.
3. Token Refreshing
The access token required for API calls is obtained using the refresh token. Since the access token expires in 1 hour, it is automatically updated every 50 minutes (refreshed before it expires).
Refresh tokens also have an expiration, but if the Spotify API returns a new refresh token, it is automatically saved (token rotation).
Tokens are stored in a special area of the ESP32 called NVS (Non-Volatile Storage), ensuring they are retained even after a reboot. On the first boot, the refresh token embedded in config.h is used, and after that, the one saved in NVS takes priority.
4. API Calls
Endpoints Used
-
GET /v1/me/player/currently-playing- Retrieves current track information and cover art URL. -
PUT /v1/me/player/play- Starts playback. -
PUT /v1/me/player/pause- Pauses playback. -
POST /v1/me/player/next- Skips to the next track. -
POST /v1/me/player/previous- Goes back to the previous track.
To support podcasts, the ?additional_types=track,episode parameter is added to currently-playing. Without this parameter, the playback would remain stuck (changes are not recognized) when playing a podcast.
Polling Strategy
To retrieve the playback status, API calls are made every 3 seconds. Ideally, it would be best to have server-side notifications via WebSocket or SSE, but since the Spotify API doesn't provide such features, I have to rely on periodic API calls.
However, a 3-second interval causes a delay in change detection. Therefore, specifically for track transitions, I've implemented a logic that considers the track length and shortens the polling interval as it nears the end. By adjusting the next call timing based on the remaining time right before the end, I try to get information as close to the track transition as possible.
Specifically:
- Remaining time < 3 seconds: Poll 300ms before the end, then poll again after 500ms (two-stage detection).
- Remaining time < 8 seconds: Poll at 2-second intervals.
- Otherwise: 3-second interval (default).
This allows track transitions to be detected within approximately 200-300ms.
Rate Limit Handling
The specific limits for the Spotify API rate limiting are not disclosed, but an HTTP 429 error is returned when the limit is reached. I've implemented logic to read the Retry-After header included in the response and wait for the specified number of seconds. This allows for automatic recovery even if the rate limit is hit.
Executing on a Different Core
HTTP requests block other processes for 200-1000ms until a response is received. Doing this within the rendering loop causes the rotation animation to stutter.
The ESP32-S3 has dual cores (Core 0 and Core 1), and Arduino's loop() runs on Core 1. By executing Spotify API calls as a FreeRTOS task on Core 0, I can keep the rendering loop running smoothly without blocking.
// Create Spotify task on Core 0
xTaskCreatePinnedToCore(spotifyTask, "SpotifyTask", 8192, NULL, 1, &spotifyTaskHandle, 0);
Mutual exclusion is managed using a mutex for data sharing between cores (such as new album art images).
Caching
This device features an SD card slot (it even comes with a 512MB SD card).
Downloading from the network takes 500-1000ms, while reading from an SD card only takes 7-16ms. This difference significantly impacts the user experience, so I decided to cache downloaded cover art images on the SD card.
Caching Strategy
- Cache Key: A hash of the image URL. This is efficient because multiple songs from the same album share a single cache entry.
-
Expiration: Respects the HTTP
Cache-Controlheader. The Spotify CDN returnsmax-age=15780000(about 182 days), so once cached, re-downloading is almost never necessary. - stale-while-revalidate: Even if the cache has expired, it is displayed immediately while the latest version is downloaded and updated in the background. This prioritizes user experience while maintaining data freshness. (I implemented this, but the cache period is so long it never actually gets triggered lol).
Storage Management
Since the cache could grow indefinitely and consume all the SD card space, I set a limit of 100MB (approximately 2800 album covers). I implemented a mechanism to remove the oldest items (LRU) to keep usage below 80% if the limit is exceeded.
Circular Display
The documentation package downloadable from the product page includes demo code, but V2.0 seems to have a bug where the stride is off, so I referred to V1.0, which works correctly.
Library versions are critical; you must use ESP32_Display_Panel 0.1.4 included with V1.0. The newer versions (1.0.x) available in the Arduino Library Manager have completely different APIs and won't work.
While it seems common to use LVGL, I needed to optimize rendering at a very granular level, as explained in the next section, so I'm writing everything directly to the buffer myself.
The pixel format is RGB565 (16-bit/pixel), consisting of R: 5 bits, G: 6 bits, and B: 5 bits. Although the color depth is lower than RGB888 (24-bit), it only requires 2 bytes per pixel, which is memory-efficient. For the entire 360×360 screen, this is about 253KB. (Even though it's a circular display, the internal buffer needs to be 360x360; there just aren't physical pixels in the corners.)
Initialization
I referred to the V1.0 demo code. The following points are important:
- Do not use
configVendorCommands()(using it causes part of the screen to go black). -
invertColor(true)is mandatory (colors are inverted by default on the panel). - RGB565 requires a byte swap.
Since the ESP32 is little-endian, the lower byte comes first in memory. However, the ST77916 expects big-endian, so it is necessary to swap the upper and lower bytes of each pixel.
// When swapping one pixel at a time
uint16_t swapped = (color >> 8) | (color << 8);
// Speeding up by processing two pixels at a time (processing in 32-bit units)
uint32_t v = src32[j]; // Read 2 pixels
uint32_t lo = __builtin_bswap16(v & 0xFFFF);
uint32_t hi = __builtin_bswap16(v >> 16);
dst32[j] = (hi << 16) | lo;
Since 360×360 = 129,600 pixels are swapped every frame, processing them in 32-bit units for two pixels at a time halves the number of loops and improves memory access efficiency.
Immediately after power-on, the VRAM is in an indeterminate state, and turning on the backlight right away results in a momentary flash of noise on the screen. To prevent this, I follow a sequence of turning the backlight OFF right after GPIO initialization → sending a black frame → then turning the backlight ON.
Tearing Effect

It's not connected!
In this device, the TE (Tearing Effect) signal is not connected to the ESP32. TE is a signal that tells the display controller when it's "about to read a frame"—similar to VSync on a PC. If this were available, writing could be synchronized with the refresh, completely preventing tearing.
However, since TE is absent, I minimize the visibility of tearing by transmitting chunks as large as possible at high speed. Due to internal SRAM constraints, an entire frame (253KB) cannot be DMA-transferred at once, so the transmission is split into the top and bottom halves of the screen (180 lines x 2). While tearing can still occur, the chunks are large enough that it isn't noticeable in practice.
Rotating the Image
Rotating an image in Web or Unity is not difficult at all these days and can be done without much thought. However, achieving smooth rotation animation with decent image quality on a low-powered ESP32 CPU is quite a challenge and requires various techniques.
Nearest Neighbor or Bilinear Interpolation
Nearest Neighbor (NN) is a method that interpolates using the closest pixel, while Bilinear Interpolation uses the surrounding four pixels.
Nearest Neighbor results in rougher image quality but is fast due to low computational complexity. Bilinear Interpolation provides better image quality but is slower due to higher computational complexity.

Status at 10 degrees rotation. Left: Nearest Neighbor, Right: Bilinear Interpolation
By the way, I used NN for the outer black record part because the quality difference wasn't very noticeable, and it helped with speed. Reducing the area processed with Bilinear Interpolation has a significant impact.
Edge Anti-Aliasing
The cover art needs to be cropped into a circle, but naively checking whether each pixel is inside the circle naturally results in aliasing (jaggies).
On the other hand, applying full anti-aliasing (AA) would cause the computational load to explode, making animation impossible.

Left: Without AA, Right: With AA
To solve this, I pre-calculate which pixels lie on the edge of the circle and to what extent they cover the pixel (Supersampling). I then use this to blend the cover art with the background to make it look smooth.
Additionally, since both the record background and the cover art rotate simultaneously, this edge-blending process is performed only once when the cover art is first downloaded. By rotating the resulting composite image, I reduce the load during the rotation process.
Cover Art Transition
Simply swapping the image when the song changes feels abrupt and dull. Therefore, I implemented an animation where the new cover art expands in a circle from the center. This expansion takes 500ms.
In terms of implementation, I calculate the "display radius" according to the transition progress (0.0 to 1.0). I draw the new cover inside that circle and the previous cover (or record surface) outside it.
// easeOutQuad: Easing that starts fast and gradually slows down
float easeOutQuad(float t) {
return 1.0f - (1.0f - t) * (1.0f - t);
}
// Calculate transition radius from progress
float progress = easeOutQuad(elapsed / 500.0f);
int transitionRadius = COVER_ART_RADIUS * progress;
Drawing is done line by line. For each line (y-coordinate), I find the x-coordinate range where the transition circle intersects using x = ±√(r² - y²), and sample the new cover only within that range. Outside that range, the previous cover (or record background) is used as is.
Pre-calculated LUT
The rotation process uses sin(θ) and cos(θ) for every pixel, but calculating floating-point trigonometric functions every time on the ESP32 is too slow. Therefore, I pre-calculate a Look-Up Table (LUT) for sin/cos with 3600 entries at 0.1° increments, so I only need to look up the table according to the rotation angle.
// Pre-calculate 0.0° to 359.9° in 0.1° increments
static int32_t sinLut[3600];
static int32_t cosLut[3600];
for (int i = 0; i < 3600; i++) {
float ang = (i / 10.0f) * PI / 180.0f;
sinLut[i] = (int32_t)(sinf(ang) * 256); // 8-bit fixed point
cosLut[i] = (int32_t)(cosf(ang) * 256);
}
The circular mask boundaries are pre-calculated in the same way. From the circle equation x² + y² = r², the intersection with the circle at each line y is found by x = ±√(r² - y²). However, since sqrtf() is slow, I calculate all 360 lines at startup and store them in an array.
// Pre-calculate circle boundaries for all lines at startup
for (int y = 0; y < SCREEN_HEIGHT; y++) {
int dy = y - CENTER_Y;
int dySq = dy * dy;
if (dySq >= radiusSq) {
xStart[y] = SCREEN_WIDTH; // This line is outside the circle
xEnd[y] = -1;
} else {
int xRange = (int)sqrtf((float)(radiusSq - dySq));
xStart[y] = CENTER_X - xRange;
xEnd[y] = CENTER_X + xRange;
}
}
Since the circle's radius is fixed during normal rendering, the drawing loop doesn't call sqrtf() at all and just references the array. Eliminating 360 sqrtf() calls per frame is significant.
Fixed-point Arithmetic
Since floating-point operations themselves are slow, all coordinate calculations are performed using fixed-point arithmetic. By treating them as integers with an 8-bit fractional part, coordinate transformations can be done using only multiplication and bit-shifting.
In Bilinear interpolation, the precision of the fractional part normally affects the smoothness of the interpolation. However, 8-bit precision (256 levels) results in many multiplications for weight calculations. Therefore, I reduced the precision of the Bilinear interpolation weights specifically to 4-bit (16 levels).
// Extract only the upper 4 bits from 8-bit fixed point
uint32_t fracX = (fx >> 4) & 0x0F; // Range 0-15
uint32_t fracY = (fy >> 4) & 0x0F;
// 4-bit x 4-bit fits in 8-bit
uint32_t w00 = (16 - fracX) * (16 - fracY); // Top-left weight
uint32_t w10 = fracX * (16 - fracY); // Top-right weight
// ...
Considering the color depth of RGB565 (R:5bit, G:6bit, B:5bit), 16 levels are visually sufficient. Reducing from 8-bit to 4-bit halves the bit width of the multiplications and speeds up the calculation.
Coordinate Transformation via Incremental Calculation
Calculating the rotation matrix naively would require the multiplication cos(θ)*x + sin(θ)*y for each pixel. Doing this for 360×360 = 129,600 pixels is heavy.
Here, I use the properties of the rotation matrix. The source coordinates are found by:
srcX = cos(θ) * x + sin(θ) * y
srcY = -sin(θ) * x + cos(θ) * y
When x increases by 1:
srcX' = cos(θ) * (x+1) + sin(θ) * y = srcX + cos(θ)
srcY' = -sin(θ) * (x+1) + cos(θ) * y = srcY - sin(θ)
In other words, since y is the same and only x increases by 1 for adjacent pixels on the same line, I can perform the full calculation only for the first pixel of the line, and then find the coordinates of the next pixel using just the additions += cos(θ) and -= sin(θ).
Since I'm calculating with fixed-point (integers), rounding errors like those in floating-point do not accumulate. Also, because the full calculation is reset at the start of each line, errors do not build up between lines.
// Full calculation only at the start of the line
int32_t srcX = cosA * dxStart + sinA * dy + centerFP;
int32_t srcY = -sinA * dxStart + cosA * dy + centerFP;
// Only additions from here on
for (int x = xStart; x <= xEnd; x++) {
sample(srcX, srcY);
srcX += cosA; // No multiplication!
srcY -= sinA;
}
This optimization drastically reduces the number of multiplications and improves rendering speed.
Double Buffering
To display an image on the screen, the CPU needs to create a buffer in advance, fill it with pixel data, and then send it to the display controller via QSPI. The process of filling the buffer with pixel data must be done by the CPU, but the QSPI transfer part can be handled by DMA to send data to the display controller without CPU intervention. However, writing to a buffer while it's being transmitted causes tearing, so I avoid this by preparing another separate buffer and alternating between the drawing side and the transmission side.
Furthermore, by taking advantage of the ESP32-S3's dual cores and running rendering on Core 1 (Arduino's loop()) and DMA transmission on Core 0 (background task) in parallel, rendering and transmission are completely parallelized, achieving smooth animation at 25-28 FPS. (I'd honestly like to go higher, but this seems to be the limit...)
IRAM and PSRAM
The ESP32-S3 has internal SRAM (512KB) and external PSRAM. The area within internal SRAM that allows for high-speed access is called IRAM (Instruction RAM), and placing code or data there allows for high-speed execution. On the other hand, PSRAM (Pseudo-Static RAM) is external memory connected via OPI (Octal SPI); while it has a large capacity (8MB for this board), its access speed is slower than internal SRAM.
In this project, I allocate about 1.3MB of buffers (two frame buffers, album art, background record texture, etc.) in PSRAM. Since more than half of the drawing time was spent on PSRAM reads and writes, I tried moving some of it to internal SRAM, but internal SRAM is used by the Wi-Fi stack and other functions, leaving almost no room. Moving just a small amount sometimes even made it slower, so I gave up.
Details of the rotation rendering (memory layout, optimization process, etc.) are summarized in ROTATION_RENDERING.md.
Touch Panel
This device is equipped with an I2C-connected capacitive touch panel (CST816S).
In this app, the following gestures are implemented:
- Center tap: Play/Pause
- Edge tap (left/right 40px): Previous track/Next track
- Horizontal swipe: Next track/Previous track
The touch controller is periodically polled to retrieve coordinates, and the gesture is determined based on the tap duration and movement distance at the moment the finger is released. A horizontal movement of 50px or more is treated as a swipe, while anything less is processed as a tap. Short touches (less than 30ms) or chattering are ignored.
While gesture recognition is easy to do with LVGL, as mentioned before, I had already done the rendering independently and the gestures weren't complex, so I went with a custom implementation.
arduwrap.py
In previous projects, I found it tedious that the serial port would be busy if the serial monitor was left open, causing errors when trying to upload newly built firmware. To solve this, I created a wrapper script for arduino-cli.
With this tool, you can keep the serial monitor open in a separate terminal, and it will automatically close it when a compile or upload is needed, then reopen it once the upload is finished. It's very convenient.
./tools/arduwrap serve --port /dev/cu.usbmodem5301 --baud 115200
serve opens the serial monitor. This process opens a UNIX domain socket and waits for compilation instructions.
./tools/arduwrap compile --fqbn esp32:esp32:esp32s3:USBMode=hwcdc,CDCOnBoot=cdc,FlashMode=qio,FlashSize=16M,PartitionScheme=huge_app,PSRAM=opi ../SPNFY/SPNFY.ino
Use the compile subcommand when compiling. This process sends a compilation instruction to the serve process and waits for the result. The arguments are passed almost directly to arduino-cli. The stdout/stderr are also streamed as-is.
When the serve side receives the compile subcommand, it closes the serial monitor, calls arduino-cli to perform the compilation, and sends the results back to the compile process. Once compilation is complete, it reopens the serial monitor and displays the output. There’s a little trick here: if it simply waited for the arduino-cli process to finish, the serial monitor would open several seconds after the device starts. By opening it immediately after the Hard resetting via RTS pin message appears in the arduino-cli output, you can capture the serial output almost from the very beginning of the device boot.
Additionally, I've implemented a log subcommand, which allows you to retrieve logs (from a 64KB buffer) since startup. You can also filter them by any string.
By describing how to use these in .cursorrules or AGENTS.md, the AI itself can handle the entire cycle—from fixing code to building, uploading, and checking execution results. This makes it possible to ask the AI to "keep working on this complex calculation implementation until it's correct." Very handy.
Source Code
What I Learned
- By continuing to give appropriate instructions to the AI, even quite complex rendering optimizations can be achieved in a short time (actual coding for the whole project, including other parts, took about 3 days).
- How to use the dual cores of the ESP32-S3.
- The slowness of PSRAM.
- Passing drafts of this article and the code to Cursor for review is excellent (it even taught me detailed algorithms I didn't know).
Discussion