V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation Paper • 2603.11042 • Published about 1 month ago • 3
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation Paper • 2509.22653 • Published Sep 26, 2025 • 25
Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery Paper • 2411.02136 • Published Nov 4, 2024 • 1
Multi-Source Urban Traffic Flow Forecasting with Drone and Loop Detector Data Paper • 2501.03492 • Published Jan 7, 2025 • 1
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 157
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 44
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation Paper • 2405.20289 • Published May 30, 2024 • 11
Presto! Distilling Steps and Layers for Accelerating Music Generation Paper • 2410.05167 • Published Oct 7, 2024 • 18
DITTO: Diffusion Inference-Time T-Optimization for Music Generation Paper • 2401.12179 • Published Jan 22, 2024 • 21
CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments Paper • 1811.02735 • Published Nov 7, 2018
Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation Paper • 1807.01126 • Published Jul 3, 2018