Tell What You Hear From What You See -- Video to Audio Generation Through Text Paper • 2411.05679 • Published Nov 8, 2024 • 1
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization Paper • 2503.22200 • Published Mar 28, 2025 • 1
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing Paper • 2506.05414 • Published Jun 4, 2025 • 2
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering Paper • 2310.06238 • Published Oct 10, 2023 • 1
MuseChat: A Conversational Music Recommendation System for Videos Paper • 2310.06282 • Published Oct 10, 2023 • 1
Do Joint Audio-Video Generation Models Understand Physics? Paper • 2605.07061 • Published 14 days ago • 12