HuggingFaceTB/SmolVLM2-256M-Video-Instruct Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 139k • 96
DiffusionNFT: Online Diffusion Reinforcement with Forward Process Paper • 2509.16117 • Published Sep 19, 2025 • 22
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3, 2025 • 321
MiMo-Embodied: X-Embodied Foundation Model Technical Report Paper • 2511.16518 • Published Nov 20, 2025 • 26
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper • 2510.10274 • Published Oct 11, 2025 • 16