Hybrid ACT+Diffusion β€” ALOHA Single-Arm (Left) β€” 13.4k steps

Custom HybridACTDiffusion policy: ACT visual encoder (ResNet18 + 4-layer Transformer, mean-pooled) feeds a Diffusion U-Net decoder (FiLM conditioning, DDPM training, DDIM 10-step inference). No VAE β€” diffusion handles multimodal action distributions directly.

This is the initial 13.4k-step Hybrid baseline (S002). For the longer 40k retrain, see JHeisler/aloha_solo_left_act_diffusion_40k.

Architecture

Images (cam_high, cam_left_wrist) + State (dim=9)
     β”‚
     β–Ό
ACT Encoder (ResNet18 β†’ 4-layer Transformer) β†’ mean-pool β†’ (B, 512) global cond vector
     β”‚
     β–Ό
Diffusion U-Net (DiffusionConditionalUnet1d, FiLM modulation, down_dims=(256,512))
     β”‚  DDPM training / DDIM 10-step inference
     β–Ό
Action chunks (chunk_size=100, action_dim=9)

Training Config

Field Value
Architecture HybridACTDiffusion (ACT encoder + Diffusion U-Net) β€” see lerobot/common/policies/hybrid_act_diffusion/
Dataset JHeisler/aloha_solo_left_4_6_26 β€” 50 episodes, 29,785 samples, 30 fps
State / action dim 9 / 9
Cameras cam_high, cam_left_wrist (3Γ—480Γ—640 each)
Steps 13,400
Batch size 24 (DOE winner)
Learning rate 3e-5
Total samples seen 321K (10.6 epochs)
AMP enabled
torch.compile enabled
Diffusion scheduler DDPM training (100 timesteps, squaredcos_cap_v2), DDIM at inference (10 steps)
Final loss (DDPM noise-pred MSE) 0.011–0.020
Final grad norm 0.2–0.7
Wall clock ~1h 16min on RTX A4500
LeRobot pin 96c7052777aca85d4e55dfba8f81586103ba8f61 (with custom hybrid_act_diffusion policy added)

Project Lineage

Workstream Model Steps Samples HF
S001 ACT 13,400 640K act_left
S002 Hybrid ACT+Diffusion 13,400 321K this repo
S003 ACT (shipped) 40,000 1.92M act_left_40k
S004 Hybrid ACT+Diffusion 40,000 1.12M act_diffusion_40k

Notes on loss comparability

DDPM noise-prediction MSE (this model) and ACT's L1+KL combo (S001/S003) are different loss surfaces β€” absolute loss values are NOT directly comparable across architectures. The right comparison is offline action L1 on held-out episodes or real-robot rollout success rate.

Usage

The custom policy class lives in this project's LeRobot fork. To use:

# Requires lerobot pinned to 96c7052 with hybrid_act_diffusion policy package added
from lerobot.common.policies.hybrid_act_diffusion.modeling_hybrid_act_diffusion import HybridACTDiffusionPolicy
policy = HybridACTDiffusionPolicy.from_pretrained("JHeisler/aloha_solo_left_act_diffusion")

Citation / Course

EN.525.681 school project β€” JHU Whiting School of Engineering. Team: Jake Heisler, Laura Kroening, Purushottam Shukla.

Code reference: HuggingFace LeRobot at commit 96c7052 with custom hybrid policy package.

Downloads last month
49
Safetensors
Model size
47.2M params
Tensor type
F32
Β·
Video Preview
loading

Dataset used to train JHeisler/aloha_solo_left_act_diffusion