PromptBridge: Cross-Model Prompt Transfer for Large Language Models Paper • 2512.01420 • Published 7 days ago • 8
M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark Paper • 2511.17729 • Published 17 days ago • 16
R-WoM: Retrieval-augmented World Model For Computer-use Agents Paper • 2510.11892 • Published Oct 13 • 21
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29 • 140
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training Paper • 2504.09710 • Published Apr 13 • 19
Scaling Laws in Scientific Discovery with AI and Robot Scientists Paper • 2503.22444 • Published Mar 28 • 12
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering Paper • 2502.03628 • Published Feb 5 • 12
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 31