RealWonder: Real-Time Physical Action-Conditioned Video Generation Paper • 2603.05449 • Published 5 days ago • 9
view post Post 2392 Excited to see Alibaba DAMO Academy release a multimodel dataset for vision language pretraining on the hub🔥Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)Dataset: DAMO-NLP-SG/multimodal_textbook✨ 6.5M images + 0.8B text from 22k hours of instructional videos✨ Covers subjects like math, physics, and chemistry✨ Apache 2.0 See translation 🔥 5 5 🤗 3 3 + Reply
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 46