THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models Paper • 2505.22113 • Published May 28, 2025 • 2
Large Language Model Evaluation via Matrix Nuclear-Norm Paper • 2410.10672 • Published Oct 14, 2024 • 19
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 16