Theorem Prover as a Judge for Synthetic Data Generation Paper • 2502.13137 • Published Feb 18, 2025 • 1
PiCSAR: Probabilistic Confidence Selection And Ranking Paper • 2508.21787 • Published Aug 29, 2025 • 4
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning Paper • 2410.10336 • Published Oct 14, 2024 • 2
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21, 2024 • 20