🤖

Action Chunking Transformer (ACT) policies and ALOHA — what these are?

2025/05/14に公開

ACT, what it is?

 1. Action Chunking Transformer (ACT) PoliciesThe Action Chunking Transformer (ACT) is a transformer-based imitation-learning policy that predicts short sequences of actions (“action chunks”) rather than single steps, which helps capture longer‐range dependencies and reduces compounding errors during execution (Tony Z. Zhao).
ACT policies are typically trained and evaluated on the ALOHA suite of simulated bimanual manipulation tasks (e.g., cube transfer and insertion) provided via the gym-aloha package, and they can also be deployed on low-cost physical ALOHA robot kits to learn from human demonstrations shared through a community Data Pool (GitHub, Community).

 1.1 Concept and Motivation
Action Chunks vs. Single Actions

Instead of predicting one action at a time, ACT generates a fixed-length sequence of future actions—an “action chunk”—in a single forward pass, giving the policy awareness of short-term temporal context and improving stability (Tony Z. Zhao).

Conditional Variational Modeling

ACT is implemented as the decoder of a Conditional VAE: a transformer encoder processes multi-view camera images, robot joint states, and a latent style variable 𝑧, and the transformer decoder outputs the entire action chunk (Tony Z. Zhao).

About CVAE

 1.2 Architecture
Encoder

Aggregates visual observations and proprioceptive inputs into a unified representation, enabling the model to reason over multiple modalities (Tony Z. Zhao).

Decoder

Auto-regressively predicts a sequence of 𝑁 future actions (e.g., 10 steps), which the robot can execute in batch before querying the next chunk, reducing decision overhead and error compounding (Radek Osmulski).

Open-Source Implementation

The tonyzhaozh/act GitHub repository provides code to train and evaluate ACT on both simulated (gym-aloha) and real ALOHA hardware, supporting tasks like Transfer Cube and Bimanual Insertion (GitHub).

 1.3 Applications and Performance
Fine-Grained Bimanual Manipulation

ACT has demonstrated strong results on tasks requiring precise coordination of two arms, such as cube transfer and peg insertion, even when using low-cost, imprecise hardware (arXiv).

Success Rates in Simulation

For example, the act_aloha_sim_transfer_cube_human policy trained via LeRobot achieved around an 83% success rate on the AlohaTransferCube task by learning 10-step action chunks from human demonstrations (Hugging Face).

 2. ALOHA: Simulated and Physical Robotic EnvironmentsALOHA is name of robotics envronments both of virtual and physical.

 2.1 Gym-ALOHA Simulation
Two Core Tasks

The gym-aloha package offers TransferCube and Insertion environments for bimanual robots, each featuring a 14-dimensional continuous action space (six joint commands per arm + gripper open/close) (GitHub).

Demonstration Datasets

Hugging Face hosts human-collected demo datasets (e.g., lerobot/aloha_sim_transfer_cube_human), which are essential for behavior-cloning and ACT training pipelines (Hugging Face).

 2.2 Physical ALOHA Kit and Data Pool
Low-Cost Hardware

The ALOHA system—developed by Trossen Robotics—is an affordable bimanual robot platform that hobbyists and researchers can assemble, enabling real-world data collection at low cost (Community).

Collaborative Data Sharing

Users contribute their demonstration recordings to the Aloha Data Pool on Hugging Face Spaces, fostering community-driven improvements to policies trained with ACT and related methods (Community).

 2.3 Real-World Success Stories
Beyond Simulation

Real ALOHA robots have been trained to perform complex tasks—such as precise object transfer, garment manipulation, and even threading a hanger—showcasing the ability to learn dexterous skills on inexpensive hardware (newyorker.com).

1. Action Chunking Transformer (ACT) Policies

1.1 Concept and Motivation

1.2 Architecture

1.3 Applications and Performance

2. ALOHA: Simulated and Physical Robotic Environments

2.1 Gym-ALOHA Simulation

2.2 Physical ALOHA Kit and Data Pool

2.3 Real-World Success Stories

Discussion