An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published 15 days ago • 20
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition Paper • 2503.06984 • Published Mar 10 • 5
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents Paper • 2511.18685 • Published Nov 24 • 3
Video Generation Models Are Good Latent Reward Models Paper • 2511.21541 • Published about 1 month ago • 45