Instructions to use Yysrc/Mantis-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Yysrc/Mantis-Base with Transformers:
# Load model directly from transformers import Mantis model = Mantis.from_pretrained("Yysrc/Mantis-Base", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: robotics | |
| library_name: transformers | |
| # Mantis | |
| > This is the official checkpoint of **Mantis: A Versatile Vision-Language-Action Model | |
| with Disentangled Visual Foresight** | |
| - **Paper:** https://arxiv.org/pdf/2511.16175 | |
| - **Code:** https://github.com/zhijie-group/Mantis | |
| ### 🔥 Highlights | |
| - **Disentangled Visual Foresight** augments action learning without overburdening the backbone. | |
| - **Progressive Training** preserves the understanding capabilities of the backbone. | |
| - **Adaptive Temporal Ensemble** reduces inference cost while maintaining stable control. | |
| ### How to use | |
| This is the base Mantis model. For detailed usage please refer to [our repository](https://github.com/zhijie-group/Mantis). | |
| ### 📝 Citation | |
| If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/pdf/2511.16175): | |
| ``` | |
| @article{yang2025mantis, | |
| title={Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight}, | |
| author={Yang, Yi and Li, Xueqi and Chen, Yiyang and Song, Jin and Wang, Yihan and Xiao, Zipeng and Su, Jiadi and Qiaoben, You and Liu, Pengfei and Deng, Zhijie}, | |
| journal={arXiv preprint arXiv:2511.16175}, | |
| year={2025} | |
| } | |
| ``` |