OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains
Paper • 2606.14702 • Published • 19
None defined yet.
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers