THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models Paper • 2505.22113 • Published May 28, 2025 • 2
Large Language Model Evaluation via Matrix Nuclear-Norm Paper • 2410.10672 • Published Oct 14, 2024 • 19
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19, 2024 • 17
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 16
Large Language Model Evaluation via Matrix Nuclear-Norm Paper • 2410.10672 • Published Oct 14, 2024 • 19
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published Oct 12, 2024 • 16