ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11, 2025 • 22
This&That: Language-Gesture Controlled Video Generation for Robot Planning Paper • 2407.05530 • Published Jul 8, 2024 • 4
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10, 2025 • 20
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors Paper • 2411.04125 • Published Nov 6, 2024 • 1
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Paper • 2004.14973 • Published Apr 30, 2020
Self-Supervised Any-Point Tracking by Contrastive Random Walks Paper • 2409.16288 • Published Sep 24, 2024 • 6
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata Paper • 2301.04647 • Published Jan 11, 2023
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator Paper • 2105.11589 • Published May 25, 2021
Sim-to-Real Transfer for Vision-and-Language Navigation Paper • 2011.03807 • Published Nov 7, 2020