StableVLA: Towards Robust Vision-Language-Action Models without Extra Data Paper • 2605.18287 • Published 6 days ago • 15
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 17 days ago • 51
Running on Zero MCP 2.64k Wan2.2 14B Preview 🐌 2.64k generate a video from an image with a text prompt