αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion Paper • 2606.00386 • Published 6 days ago • 3
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 7 days ago • 169
PANDO: Efficient Multimodal AI Agents via Online Skill Distillation Paper • 2605.24785 • Published 9 days ago • 10
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 21 days ago • 145
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 23 days ago • 195
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents Paper • 2605.12481 • Published 23 days ago • 27
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 504
PixelPrune: Pixel-Level Adaptive Visual Token Reduction via Predictive Coding Paper • 2604.00886 • Published Apr 1 • 6
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 343
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 248