iTranslated by AI
How to Pass Clipboard Images to Gemini CLI on WSL2
Introduction
Recently, I've been hooked on the terminal-based AI, Gemini CLI, but I've found it a bit inconvenient in some ways. Specifically, it lacks a feature to pass images from the clipboard.
There are many situations where you might want to take a screenshot and send it, such as when you want to convey the details of an error displayed on the screen. While it is possible to specify the path of a screenshot image and pass it, it's quite a hassle.
When I looked for a better way, I found this piece of information:
In the Windows version, you can paste clipboard images with
ALT+v.
Apparently, this is a recently added feature. So, the issue is resolved for the Windows version.
However, this method cannot be used if you want to use Gemini CLI in a WSL environment. This is because while "text" clipboard sharing occurs between Windows and WSL, "image data" clipboard sharing does not.
So, this time, I'd like to share a method for passing clipboard images in a WSL environment!
Target Audience
- Gemini CLI users who usually develop in a WSL environment.
- People who don't want to leave the terminal.
- Efficiency lovers who want to minimize mouse operations and window switching as much as possible.
Benefits of Reading This Article
- You will be able to pass clipboard images to Gemini CLI at lightning speed.
- You will gain technical insights into the integration between WSL and Windows.
Finished Behavior (How to Use)
The workflow is as follows:
- Windows side: Take a screenshot.
- Gemini CLI: Execute
/c.
→ The image is imported into WSL. - Gemini CLI: Execute
/v.
→ The image is attached to the prompt, allowing you to converse with the AI.
The actual terminal screen looks like this:

Usage Example 1
Mechanism
To access the Windows clipboard from WSL, we go through Powershell.exe. This time, I've adopted a two-step approach:
- Capture (
/c): Save the clipboard image to Windows Temp via PowerShell, then move it to.gemini/tmp/under WSL. - View (
/v): Read the saved image using the@{...}syntax and attach it to the prompt.
Why did I split it into two separate commands? I'll explain the reason later.
Implementation Steps
A total of three files will be created and configured.
Step 1. Create the Image Acquisition Script
File 1: get-clip-img
A Bash script that works with PowerShell to save images.
- Path:
~/.local/bin/get-clip-img
#!/bin/bash
# Default save destination (automatically resolves the username using $HOME)
OUTPUT_FILE="${1:-$HOME/.cache/clipboard_image.png}"
OUTPUT_DIR=$(dirname "$OUTPUT_FILE")
mkdir -p "$OUTPUT_DIR"
# PowerShell command: Save clipboard image to Windows Temp
PS_COMMAND='
try {
Add-Type -AssemblyName System.Windows.Forms
if ([System.Windows.Forms.Clipboard]::ContainsImage()) {
$image = [System.Windows.Forms.Clipboard]::GetImage()
$tempPath = [System.IO.Path]::GetTempFileName()
$imagePath = $tempPath + ".png"
$image.Save($imagePath, [System.Drawing.Imaging.ImageFormat]::Png)
Remove-Item $tempPath
Write-Output $imagePath
exit 0
} else {
exit 1
}
} catch {
exit 1
}
'
# Execute PowerShell and get the path
WIN_TEMP_PATH=$(powershell.exe -NoProfile -Command "$PS_COMMAND" | tr -d '\r')
# Error check
if [ $? -ne 0 ] || [ -z "$WIN_TEMP_PATH" ]; then
echo "Error: No image found."
exit 1
fi
# Convert to WSL path and move
WSL_TEMP_PATH=$(wslpath -u "$WIN_TEMP_PATH")
if [ -f "$WSL_TEMP_PATH" ]; then
mv "$WSL_TEMP_PATH" "$OUTPUT_FILE"
echo "Image saved to: $OUTPUT_FILE"
else
echo "Error: File not found."
exit 1
fi
Step 2. Configure Gemini CLI Custom Commands
File 2: c.toml
A command for saving images.
- Path:
~/.gemini/commands/c.toml
description = "Capture clipboard image (Step 1)"
prompt = """
!{/home/<your-username>/.local/bin/get-clip-img .gemini/tmp/clipboard_image.png > /dev/null && echo \"Image captured.\" || echo \"Failed to capture image.\"}
"""
File 3: v.toml
A command for reading and sending images.
- Path:
~/.gemini/commands/v.toml
description = "View captured image (Step 2)"
prompt = """
@{.gemini/tmp/clipboard_image.png}
{{args}}
"""
Technical Hurdles and the Solutions Adopted
Actually, I initially wanted to handle everything from saving to sending with a single command. However, I ran into several technical hurdles and settled on the current format.
Hurdle 1: Security Restrictions
Initially, I tried to save images to /tmp/ or ~/.cache/, but due to the security restrictions of Gemini CLI, I encountered an error stating, "Files outside the workspace (current directory) cannot be read."
Solution
I changed the save destination to a hidden folder named .gemini/tmp/ under the current directory. This allows Gemini CLI to recognize the files as part of the project and gain access.
Hurdle 2: Race Conditions
At first, I wrote !{save image} and @{read image} side by side within a single command. However, this caused a race condition where the "read process would run before the image saving was finished." I suspect that because Gemini CLI checks @{...} files at the prompt parsing stage, it tries to read a file that hasn't been saved yet, resulting in an error.
Solution
To avoid issues with asynchronous process control and parsing order, I decided to separate the design into "Save (/c)" and "Read (/v)". The few seconds between a human typing /c and then /v functions as a reliable wait time.
Supplement: Autonomous Behavior of the Gemini CLI Agent
After executing the /c command, Gemini may autonomously recognize the image and start a conversation even without explicitly executing the /v command.
-
Phenomenon: Immediately after executing
/c, Gemini automatically reads.gemini/tmp/clipboard_image.pngusing theread_filetool and mentions the contents of the image. -
Reason: Gemini CLI acts as an agent and receives the output of the
/c command(such as "Image captured." or the file path) as context. This allows the model to infer that "the user probably wants to talk about this newly saved image" and proactively calls theread_filetool on its own initiative. -
Operation: This behavior depends on the model's reasoning and is not guaranteed.
- If Gemini reads the image (you see
✓ ReadFile ...in the logs), you can just continue the conversation. - If it doesn't read it, execute the
/vcommand as instructed.
- If Gemini reads the image (you see
Configuration to Enhance Autonomous Behavior (GEMINI.md)
To increase the probability of this "initiative-taking" behavior, it is recommended to add the following rules to the system prompt file (.gemini/GEMINI.md).
---
## Clipboard Image Integration (/c command)
- If the user executes the `/c` command and the output contains "Image captured.", you must **immediately** use the `read_file` tool to read the image path specified in the command (usually `.gemini/tmp/clipboard_image.png`).
- There is no need to wait for a `/v` command (image reading instruction) from the user. Proactively read the image and understand its contents.
- After reading, return a short response such as "I have confirmed the image" and wait for the user's question.
By making these settings, you can significantly increase the probability of autonomously reading the image file when executing /c. This allows for an operation that practically uses only the /c command, as shown below.

Usage Example 2
Conclusion
Now, you can easily pass screenshots of GUI error screens and more to Gemini CLI on WSL. If you are a terminal user using Gemini CLI in a WSL environment, please give it a try!
Discussion