WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 6 days ago • 39
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents Paper • 2605.13941 • Published 4 days ago • 21
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 10 days ago • 182
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling Paper • 2605.05922 • Published 10 days ago • 4
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published 10 days ago • 5
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 11 days ago • 97
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published 17 days ago • 71
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
daaxila/twitter-xiaolei4O4-2025.12.13-1999816392032538829-p_xBdlpzIoufFZiM-part1 Viewer • Updated Apr 3 • 1 • 10 • 1
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 350
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published Mar 19 • 95
Believe Your Model: Distribution-Guided Confidence Calibration Paper • 2603.03872 • Published Mar 4 • 40