LAP-3B

Language-Action Pre-Training Enables Zero-Shot Cross-Embodiment Transfer

🌐 Website: https://lap-vla.github.io/
📄 Paper: https://arxiv.org/abs/2602.10556
💻 Code: https://github.com/lihzha/lap

Download

You can download the LAP checkpoint directly from the Hugging Face Hub.

Using the Hugging Face CLI (recommended)

Install the Hugging Face Hub CLI:

pip install -U huggingface_hub

Download the checkpoint to the expected directory:

hf download lihzha/LAP-3B --local-dir ./checkpoints/lap

After downloading, the checkpoint will be located at:

./checkpoints/lap

This matches the default path expected by the LAP codebase.

Alternative: Python API

You can also download the checkpoint programmatically:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="lihzha/LAP-3B",
    local_dir="./checkpoints/lap"
)

Model Summary

LAP-3B is a Vision-Language-Action (VLA) model trained using Language-Action Pre-Training (LAP), which represents actions as language-actions, allowing the model to preserve semantic reasoning capabilities from large vision-language models while learning robot control.

This design enables strong zero-shot transfer across robot embodiments, allowing the same model to generalize to different robots without architecture changes or embodiment-specific fine-tuning.

For model and training details, please refer to our paper.

Key Capabilities

LAP-3B demonstrates strong cross-embodiment generalization across multiple real robot platforms.

Highlights from the paper:

>50% average zero-shot success rate on unseen robots
~2× improvement over prior VLA models on cross-embodiment benchmarks
Successful deployment on multiple robot platforms, including:
- Franka Panda
- Kinova
- YAM
- DROID

Supported manipulation tasks include:

pick and place
object sorting
container placement
towel manipulation

Limitations

While LAP-3B demonstrates strong cross-embodiment transfer, several limitations remain:

Current experiments primarily focus on single-arm manipulation.
Performance may degrade in settings involving highly dexterous manipulation or rich visual distractors.

Future work includes extending LAP to more complex embodiments and tasks, including:

bimanual robots
dexterous hands
mobile manipulation systems

Intended Use

LAP-3B is intended for:

research in robot learning
vision-language-action models
cross-embodiment policy learning
manipulation policy research

The model is not intended for safety-critical deployments without additional validation.

Citation

If you use LAP-3B in your research, please cite:

@article{zha2026lap,
  title={LAP: Language-Action Pre-Training Enables Zero-Shot Cross-Embodiment Transfer},
  author={Zha, Lihan and Hancock, Asher and Zhang, Mingtong and Yin, Tenny and Huang, Yixuan and Shah, Dhruv and Ren, Allen Z. and Majumdar, Anirudha},
  journal={arXiv preprint arXiv:2602.10556},
  year={2026}
}

Downloads last month: 36

Video Preview

Robotics

Collection including lihzha/LAP-3B

LAP

Collection

LAP: Language-Action Pre-training Enables Zero-Shot Cross-Embodiment Transfer • 2 items • Updated Feb 9 • 3

Paper for lihzha/LAP-3B

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Paper • 2602.10556 • Published Feb 11 • 2