Update README.md

527be11 verified 6 months ago

6.3 kB

	---
	datasets:
	- lerobot/aloha_sim_insertion_human
	library_name: lerobot
	license: apache-2.0
	model_name: diffusion
	pipeline_tag: robotics
	tags:
	- lerobot
	- robotics
	- diffusion
	- aloha
	- imitation-learning
	- benchmark
	---

	# 🦾 Diffusion Policy for Aloha Insertion (200k Steps)

	[![LeRobot](https://img.shields.io/badge/Library-LeRobot-yellow)](https://github.com/huggingface/lerobot)
	[![Task](https://img.shields.io/badge/Task-Aloha_Insertion-blue)](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human)
	[![UESTC](https://img.shields.io/badge/Author-UESTC_Graduate-red)](https://www.uestc.edu.cn/)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green)](https://www.apache.org/licenses/LICENSE-2.0)

	## 🎯 Research Purpose

	Important Note: This model was trained primarily for academic comparison—evaluating the performance difference between Diffusion Policy and ACT algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), not to train a highly successful practical model.

	> Summary: This model represents a benchmark experiment for Diffusion Policy on the challenging Aloha Insertion task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.

	- 🧩 Task: Aloha Insertion (Simulated, 3D)
	- 🧠 Algorithm: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM)
	- 🔄 Training Steps: 200,000
	- 🎓 Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)

	---

	## 🔬 Benchmark Results (vs ACT)

	This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a 2% success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.

	### 📊 Evaluation Metrics (50 Episodes)

	\| Metric \| Value \| Comparison to ACT Baseline \| Status \|
	\| :--- \| :---: \| :--- \| :---: \|
	\| Success Rate \| 0.0% \| Slightly Lower (ACT: 2.0%) \| 📉 \|
	\| Avg Max Reward \| 0.10 \| Partial Success (Grasping achieved) \| 🚧 \|
	\| Avg Sum Reward \| 8.20 \| Stable Trajectories \| ✅ \|

	> Note: The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.

	---

	## ⚙️ Model Details

	\| Parameter \| Description \|
	\| :--- \| :--- \|
	\| Architecture \| ResNet18 (Vision Backbone) + U-Net (Diffusion Head) \|
	\| Input \| 3 Camera Views (Top, Left, Right) \|
	\| Prediction Horizon \| 16 steps \|
	\| Observation History \| 2 steps \|
	\| Action Steps \| 8 steps \|

	---

	## 🔧 Training Configuration

	For reproducibility, here are the key parameters used during the training session.

	- Source: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN).
	- Batch Size: 8 (Limited by 8GB VRAM)
	- Optimizer: AdamW (`lr=1e-4`)
	- Scheduler: Cosine with warmup
	- Vision: ResNet18 with GroupNorm (Cropped to 420x560)

	### Original Training Command (My Resume Mode)

	```bash
	python -m lerobot.scripts.lerobot_train \
	--config_path diffusion_aloha.yaml \
	--env.type aloha \
	--env.task AlohaInsertion-v0 \
	--dataset.repo_id lerobot/aloha_sim_insertion_human \
	--wandb.enable true \
	--job_name DP_Aloha_Insertion \
	--policy.repo_id Lemon-03/DP_Aloha_Insertion_test \
	```

	### diffusion_aloha.yaml
	<details>
	<summary>📄 <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary>

	```yaml
	# @package _global_

	# Random seed
	seed: 100000
	job_name: Diffusion-Aloha-Insertion

	# Training parameters
	steps: 200000 # Original file states 200k steps (Aloha is difficult to train)
	eval_freq: 20000 # Slightly increased frequency to monitor progress
	save_freq: 20000
	log_freq: 200
	batch_size: 8 # ⚠️ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient

	# Dataset
	dataset:
	repo_id: lerobot/aloha_sim_insertion_human

	# Evaluation settings
	eval:
	n_episodes: 50
	batch_size: 8 # Keep consistent with training

	# Environment settings
	env:
	type: aloha
	task: AlohaInsertion-v0
	fps: 50

	# Policy configuration
	policy:
	type: diffusion

	# --- Vision processing ---
	vision_backbone: resnet18
	# Aloha images are rectangular, using specific crop dimensions here
	crop_shape: [420, 560]
	crop_is_random: true
	pretrained_backbone_weights: null # Original config specifies not to load pretrained weights
	use_group_norm: true
	spatial_softmax_num_keypoints: 32

	# --- Diffusion core architecture (U-Net) ---
	down_dims: [512, 1024, 2048]
	kernel_size: 5
	n_groups: 8
	diffusion_step_embed_dim: 128
	use_film_scale_modulation: true

	# --- Action prediction parameters ---
	n_action_steps: 8
	n_obs_steps: 2
	horizon: 16

	# --- Noise scheduler (DDPM) ---
	noise_scheduler_type: DDPM
	num_train_timesteps: 100
	num_inference_timesteps: 100
	beta_schedule: squaredcos_cap_v2
	beta_start: 0.0001
	beta_end: 0.02
	prediction_type: epsilon
	clip_sample: true
	clip_sample_range: 1.0

	# --- Optimizer ---
	optimizer_lr: 1e-4
	optimizer_weight_decay: 1e-6
	#grad_clip_norm: 10

	scheduler_name: cosine
	scheduler_warmup_steps: 500

	use_amp: true
	```
	</details>

	-----

	## 🚀 Evaluate (My Evaluation Mode)

	To evaluate this model locally, run the following command:

	```bash
	python -m lerobot.scripts.lerobot_eval \
	--policy.type diffusion \
	--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
	--eval.n_episodes 50 \
	--eval.batch_size 8 \
	--env.type aloha \
	--env.task AlohaInsertion-v0
	```