iTranslated by AI
Gemini Robotics: Next-Generation Robot AI by Google
Introduction
Google DeepMind's Gemini Robotics, announced in 2025, is a foundational AI model specialized for robotics. Based on Gemini 2.0, it is designed as a VLA (Vision-Language-Action) model that integrates vision, language, and action.
This article provides an overview of Gemini Robotics and its primary features.
By reading this article, you will understand the following:
- Basic mechanisms of Gemini Robotics
- Features and performance of the VLA model
- Introduction of the On-Device version
- Actual use cases and future outlook
What is Gemini Robotics?
Gemini Robotics is an AI model for robots operating in the physical world. Built upon the multimodal reasoning capabilities of Gemini 2.0, it takes visual information and natural language instructions as input and translates them into robot action commands.
VLA (Vision-Language-Action) Model
At the core of Gemini Robotics is the VLA architecture:
- Vision: Understanding images and videos from cameras and sensors
- Language: Interpreting instructions in natural language
- Action: Generating specific robotic movements
This approach allows robots to understand natural instructions like "Please pick up the cup on the table," identify the target object from visual information, and execute the appropriate action.
Key Features
1. High Generality
Gemini Robotics has recorded more than twice the generality score compared to conventional VLA models. It can adapt to new objects and unknown environments without prior training.
2. Dexterity
Supports complex manipulation tasks:
- Folding origami
- Packing items into a bag
- Opening and closing zippers
- Assembling industrial products
These tasks require precise control and multi-step planning.
3. Interactivity
- Responds to natural language instructions
- Adapts to environmental changes in real-time
- Allows instruction changes during task execution
- Multilingual support
Gemini Robotics-ER (Embodied Reasoning)
Gemini Robotics has a version called Gemini Robotics-ER, which is specialized for spatial awareness and reasoning.
Spatial Awareness Features
Gemini Robotics-ER excels at understanding physical space:
- Object Position Recognition: Generates 2D coordinates and bounding boxes for objects in images
- 3D Spatial Understanding: Supports experimental 3D reasoning features
- Spatial Relationship Understanding: Grasps positional relationships between objects
Task Decomposition and Planning
It breaks down complex instructions into specific steps:
Instruction: "Please clean up the kitchen"
→ Decomposed steps:
1. Identify dishes on the table
2. Move dishes to the cupboard
3. Wipe the counter
4. Take out the trash
Integration via API
Gemini Robotics-ER is available through Google AI Studio and the Gemini API. It uses a normalized coordinate space (0-1000) and provides consistent output independent of image resolution.
Gemini Robotics On-Device
In June 2025, Google DeepMind announced Gemini Robotics On-Device. This version is optimized to be executed directly on the robot itself.
Advantages of On-Device
1. No Cloud Connection Required
Robots can operate without an internet connection:
- Elimination of communication latency
- Resilience to network failures
- Improved privacy
- Realization of real-time control
2. Maintaining High Performance
Maintains performance close to the cloud version despite local execution:
- Performance significantly exceeding conventional on-device models
- High success rate in complex tasks
- Operation with low latency
3. Rapid Adaptation
New tasks can be learned with 50-100 demonstrations. This achieves significant efficiency compared to previous models that required more than 500.
SDK Provision
The Gemini Robotics SDK includes the following:
- Evaluation tools: Testing in the MuJoCo physics simulator
- Fine-tuning pipeline: Adaptation to new tasks
- Lifecycle management: CLI and Python tools
- Agent framework: Building robot agents
# Installing the SDK (via PyPI)
pip install safari_sdk
Training Platform: ALOHA 2
Gemini Robotics is trained on the ALOHA 2 platform.
Features of ALOHA 2
- Bimanual Robots: Coordinated operation with two arms
- Open Source: Hardware design and software are publicly available
- Teleoperation: Intuitive operation by humans is possible
- Low Cost: Pricing suitable for research and development purposes
Demonstration Examples
At Google I/O 2025, a demo of Gemini Robotics using ALOHA 2 was showcased:
- Packing a lunch box
- Dunking a basketball
- Responding to voice instructions
- Adapting to new objects
These tasks were successful even without specific prior training for them.
Partnership with Boston Dynamics
In January 2026, Boston Dynamics and Google DeepMind announced a strategic partnership.
Partnership Details
- Integration of Gemini Robotics into the Atlas humanoid robot
- Addition of AI capabilities to the Spot quadruped robot
- Pilot testing in manufacturing (at Hyundai plants)
Expected Effects
While conventional robots were limited to pre-programmed tasks, Gemini Robotics enables:
- Understanding of instructions in natural language
- Adaptation to unstructured environments
- Complex planning and reasoning
- Manipulation of new objects
These capabilities will be introduced to industrial robots.
Performance Benchmarks
Gemini Robotics shows performance that significantly exceeds existing VLA models.
Comparison Table
| Item | Gemini Robotics | Conventional VLA Models |
|---|---|---|
| Generality Benchmark | Over 2x | Baseline |
| Dexterity | High | Moderate |
| Instruction Understanding | Excellent | Good |
| Demos Required for Adaptation | 50-100 | 500+ |
| On-Device Performance | High | Low to Moderate |
Real-world Verification
Testing in real-world environments is underway in collaboration with several companies:
- Apptronik (Humanoid robots)
- Boston Dynamics (Atlas, Spot)
- Agility Robotics (Logistics robots)
- Enchanted Tools (Service robots)
Use Cases
Manufacturing
- Assembly of industrial products
- Quality inspection
- Inventory management
- Flexible production lines
Logistics
- Product picking
- Packing operations
- Movement within warehouses
- Delivery preparation
Medical and Nursing Care
- Preparation of medical instruments
- Patient support
- Environmental organization and tidying
- Rehabilitation support
Home
- Household chore assistance
- Tidying and organizing
- Transporting items
- Everyday tasks
Technical Mechanisms
Multimodal Processing
Gemini Robotics processes multiple inputs simultaneously:
Input:
- Camera images/videos
- Voice instructions
- Text instructions
- Sensor data
↓ Integrated understanding by Gemini 2.0
Output:
- Robot control commands
- Action sequences
- Execution plans
Continual Learning
Robots continue to learn during execution:
- Execute the task
- Feed back the results
- Fine-tune the model
- Improve in the next execution
Through this mechanism, performance improves the more it is used.
Safety Considerations
Google places high importance on ethics and safety:
- Collaboration with experts
- Development of safety guidelines
- Conducting risk assessments
- Ensuring transparency
Access Methods
Gemini Robotics-ER 1.5
- Available through Google AI Studio
- Integration via Gemini API
- Python and REST interfaces
Gemini Robotics On-Device
- Application to the Trusted Tester Program is required
- Documentation published on GitHub
- Provision of SDK (
safari_sdk)
Summary
Gemini Robotics represents a significant leap forward in robotics. Through the VLA model integrating vision, language, and action, robots are now able to understand human instructions and execute complex tasks.
Main points:
- VLA model based on Gemini 2.0
- Over twice the generality of conventional models
- Support for on-device execution
- Collaboration with major companies like Boston Dynamics
- Wide range of applications from manufacturing to home use
The emergence of the On-Device version has enabled high-performance robot control without the need for a cloud connection. The partnership with Boston Dynamics is expected to accelerate practical implementation in industrial sectors.
As part of the initiative to integrate AI into the physical world, Gemini Robotics is opening up new possibilities in the field of robotics.
Reference Links
Official Documentation
Technical Information
- Gemini Robotics: Bringing AI into the Physical World (arXiv)
- ALOHA 2 Platform
- Gemini Robotics SDK (GitHub)
Discussion