SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding Paper • 2604.13023 • Published 4 days ago
openai/whisper-large-v3 Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.81M • • 5.59k
Multi-Agent System for Comprehensive Soccer Understanding Paper • 2505.03735 • Published May 6, 2025 • 25
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated Mar 2 • 166