How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity Paper • 2511.08487 • Published Nov 11, 2025 • 2
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Paper • 2511.14366 • Published Nov 18, 2025 • 16
CF-CAM: Cluster Filter Class Activation Mapping for Reliable Gradient-Based Interpretability Paper • 2504.00060 • Published Mar 31, 2025 • 1
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published Jul 9, 2025 • 28
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published Jul 8, 2025 • 21
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published Jul 6, 2025 • 44
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published Jun 3, 2025 • 40
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Paper • 2505.19815 • Published May 26, 2025 • 36
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26, 2025 • 104
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11, 2025 • 55
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21, 2025 • 54
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper • 2501.18119 • Published Jan 30, 2025 • 25
$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17, 2025 • 51