Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 7 days ago • 110
stefanocarrera/autophagycode_D_he_train-mercury_Qwen3-8B_strategy_surplexity_t1_g5_run2_metrics Viewer • Updated 7 days ago • 164 • 28 • 1
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 13 days ago • 192
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration Paper • 2605.28805 • Published 14 days ago • 11
GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation Paper • 2605.21605 • Published 21 days ago • 13
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 20 days ago • 169
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 28 days ago • 50
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 249