DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 5 days ago • 172
InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published 6 days ago • 33
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21 • 110
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29 • 29
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30 • 34
view article Article 从知识更新到行为调控: 基于 EasyEdit 的大模型知识编辑框架 Jul 15 • 5
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Paper • 2506.10960 • Published Jun 12 • 12
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12 • 19
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training Paper • 2505.14681 • Published May 20 • 10