GRM2
Collection
Powerfull Reasoning-focused models for general reasoning and agentic tasks. • 1 item • Updated
A powerful 3B general-purpose reasoning model built for strong multi-domain performance and long-chain reasoning.
This model is a 3B-parameter AI designed for general-purpose, reasoning-focused tasks, with a strong emphasis on improving multi-domain reasoning across code, mathematics, science, and complex knowledge tasks. It is optimized for handling long chains of thought, enabling more structured, accurate, and reliable reasoning over difficult problems.
Despite its compact size, the model achieves strong benchmark performance, making it an efficient choice for users who want a balance between reasoning quality, versatility, and deployability.
| Benchmark | OrionLLM/GRM2-3b | Qwen/Qwen3-32B | Qwen/Qwen3.5-4B | Qwen/Qwen3.5-9B |
|---|---|---|---|---|
| LiveCodeBench v6 | 76.9 | 55.7 | 55.8 | 65.6 |
| HMMT Nov 25 | 77.92 | 57.08 | 76.8 | 82.9 |
| GPQA / GPQA Diamond | 83.8 | 68.4 | 76.2 | 81.7 |
| MultiChallenge | 52.21 | 38.72 | 49.0 | 54.5 |
| AIME 2026 | 87.40 | 75.83 | N/D | 92.50 |