Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper β’ 2601.19325 β’ Published Jan 27 β’ 81
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper β’ 2601.16973 β’ Published Jan 23 β’ 40
Paused 238 Omnilingual ASR Media Transcription π 238 Transcribe audio/video files into text instantly
Running 111 Qwen3 TTS Voice Design π 111 Generate custom voices from text using natural language prompts