Instructions to use Lemon-03/DP_Aloha_Insertion_test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Lemon-03/DP_Aloha_Insertion_test with LeRobot:
- Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - lerobot/aloha_sim_insertion_human | |
| library_name: lerobot | |
| license: apache-2.0 | |
| model_name: diffusion | |
| pipeline_tag: robotics | |
| tags: | |
| - lerobot | |
| - robotics | |
| - diffusion | |
| - aloha | |
| - imitation-learning | |
| - benchmark | |
| # π¦Ύ Diffusion Policy for Aloha Insertion (200k Steps) | |
| [](https://github.com/huggingface/lerobot) | |
| [](https://huggingface.co/datasets/lerobot/aloha_sim_insertion_human) | |
| [](https://www.uestc.edu.cn/) | |
| [](https://www.apache.org/licenses/LICENSE-2.0) | |
| ## π― Research Purpose | |
| **Important Note:** This model was trained primarily for **academic comparison**βevaluating the performance difference between **Diffusion Policy** and **ACT** algorithms under identical training conditions (using the `lerobot/aloha_sim_insertion_human` dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), **not to train a highly successful practical model**. | |
| > **Summary:** This model represents a benchmark experiment for **Diffusion Policy** on the challenging **Aloha Insertion** task (Simulated). It was trained using the [LeRobot](https://github.com/huggingface/lerobot) framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods. | |
| - **π§© Task**: Aloha Insertion (Simulated, 3D) | |
| - **π§ Algorithm**: [Diffusion Policy](https://huggingface.co/papers/2303.04137) (DDPM) | |
| - **π Training Steps**: 200,000 | |
| - **π Author**: Graduate Student, **UESTC** (University of Electronic Science and Technology of China) | |
| --- | |
| ## π¬ Benchmark Results (vs ACT) | |
| This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a **2%** success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment. | |
| ### π Evaluation Metrics (50 Episodes) | |
| | Metric | Value | Comparison to ACT Baseline | Status | | |
| | :--- | :---: | :--- | :---: | | |
| | **Success Rate** | **0.0%** | **Slightly Lower** (ACT: 2.0%) | π | | |
| | **Avg Max Reward** | **0.10** | **Partial Success** (Grasping achieved) | π§ | | |
| | **Avg Sum Reward** | **8.20** | **Stable Trajectories** | β | | |
| > **Note:** The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain. | |
| --- | |
| ## βοΈ Model Details | |
| | Parameter | Description | | |
| | :--- | :--- | | |
| | **Architecture** | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) | | |
| | **Input** | 3 Camera Views (Top, Left, Right) | | |
| | **Prediction Horizon** | 16 steps | | |
| | **Observation History** | 2 steps | | |
| | **Action Steps** | 8 steps | | |
| --- | |
| ## π§ Training Configuration | |
| For reproducibility, here are the key parameters used during the training session. | |
| - **Source**: Configuration adapted from [CSCSX/LeRobotTutorial-CN](https://github.com/CSCSX/LeRobotTutorial-CN). | |
| - **Batch Size**: 8 (Limited by 8GB VRAM) | |
| - **Optimizer**: AdamW (`lr=1e-4`) | |
| - **Scheduler**: Cosine with warmup | |
| - **Vision**: ResNet18 with GroupNorm (Cropped to 420x560) | |
| ### Original Training Command (My Resume Mode) | |
| ```bash | |
| python -m lerobot.scripts.lerobot_train \ | |
| --config_path diffusion_aloha.yaml \ | |
| --env.type aloha \ | |
| --env.task AlohaInsertion-v0 \ | |
| --dataset.repo_id lerobot/aloha_sim_insertion_human \ | |
| --wandb.enable true \ | |
| --job_name DP_Aloha_Insertion \ | |
| --policy.repo_id Lemon-03/DP_Aloha_Insertion_test \ | |
| ``` | |
| ### diffusion_aloha.yaml | |
| <details> | |
| <summary>π <strong>Click to view full <code>diffusion_aloha.yaml</code> used for training</strong></summary> | |
| ```yaml | |
| # @package _global_ | |
| # Random seed | |
| seed: 100000 | |
| job_name: Diffusion-Aloha-Insertion | |
| # Training parameters | |
| steps: 200000 # Original file states 200k steps (Aloha is difficult to train) | |
| eval_freq: 20000 # Slightly increased frequency to monitor progress | |
| save_freq: 20000 | |
| log_freq: 200 | |
| batch_size: 8 # β οΈ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient | |
| # Dataset | |
| dataset: | |
| repo_id: lerobot/aloha_sim_insertion_human | |
| # Evaluation settings | |
| eval: | |
| n_episodes: 50 | |
| batch_size: 8 # Keep consistent with training | |
| # Environment settings | |
| env: | |
| type: aloha | |
| task: AlohaInsertion-v0 | |
| fps: 50 | |
| # Policy configuration | |
| policy: | |
| type: diffusion | |
| # --- Vision processing --- | |
| vision_backbone: resnet18 | |
| # Aloha images are rectangular, using specific crop dimensions here | |
| crop_shape: [420, 560] | |
| crop_is_random: true | |
| pretrained_backbone_weights: null # Original config specifies not to load pretrained weights | |
| use_group_norm: true | |
| spatial_softmax_num_keypoints: 32 | |
| # --- Diffusion core architecture (U-Net) --- | |
| down_dims: [512, 1024, 2048] | |
| kernel_size: 5 | |
| n_groups: 8 | |
| diffusion_step_embed_dim: 128 | |
| use_film_scale_modulation: true | |
| # --- Action prediction parameters --- | |
| n_action_steps: 8 | |
| n_obs_steps: 2 | |
| horizon: 16 | |
| # --- Noise scheduler (DDPM) --- | |
| noise_scheduler_type: DDPM | |
| num_train_timesteps: 100 | |
| num_inference_timesteps: 100 | |
| beta_schedule: squaredcos_cap_v2 | |
| beta_start: 0.0001 | |
| beta_end: 0.02 | |
| prediction_type: epsilon | |
| clip_sample: true | |
| clip_sample_range: 1.0 | |
| # --- Optimizer --- | |
| optimizer_lr: 1e-4 | |
| optimizer_weight_decay: 1e-6 | |
| #grad_clip_norm: 10 | |
| scheduler_name: cosine | |
| scheduler_warmup_steps: 500 | |
| use_amp: true | |
| ``` | |
| </details> | |
| ----- | |
| ## π Evaluate (My Evaluation Mode) | |
| To evaluate this model locally, run the following command: | |
| ```bash | |
| python -m lerobot.scripts.lerobot_eval \ | |
| --policy.type diffusion \ | |
| --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \ | |
| --eval.n_episodes 50 \ | |
| --eval.batch_size 8 \ | |
| --env.type aloha \ | |
| --env.task AlohaInsertion-v0 | |
| ``` |