Running Featured 110 Voxtral Mini Realtime π€ 110 Transcribe speech instantly with realβtime captions
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper β’ 2601.10611 β’ Published 27 days ago β’ 28
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models Paper β’ 2601.21639 β’ Published 13 days ago β’ 49
Running on Zero Featured 1.36k Qwen3-TTS Demo π 1.36k Generate speech from text with voice design, cloning, or speakers
LightOnOCR-2 π¦ Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family β’ 12 items β’ Updated 21 days ago β’ 22
zai-org/GLM-4.1V-9B-Thinking Image-Text-to-Text β’ 10B β’ Updated Oct 25, 2025 β’ 231k β’ β’ 770