LLM4Math - a shuoxing Collection

shuoxing 's Collections

MLLM Reasoning, Rewarding, and Understanding

LLM4Math

updated 20 days ago

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Paper • 2510.04721 • Published Oct 6
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5 • 34
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Paper • 2504.18428 • Published Apr 25
MathConstruct: Challenging LLM Reasoning with Constructive Proofs

Paper • 2502.10197 • Published Feb 14
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Paper • 2407.03203 • Published Jul 3, 2024 • 12
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Paper • 2503.21934 • Published Mar 27
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9 • 20
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Paper • 2505.05758 • Published May 9
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

Paper • 2405.12209 • Published May 20, 2024
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Paper • 2505.23851 • Published May 28
Theorem Prover as a Judge for Synthetic Data Generation

Paper • 2502.13137 • Published Feb 18 • 1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Paper • 2505.23754 • Published May 29 • 15
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 43
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Paper • 2511.11134 • Published 23 days ago • 31