Scale or Reason ?

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Nicolas-BZRD updated a Space about 10 hours ago

Scale-or-Reason/README

Nicolas-BZRD authored a paper 2 months ago

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

hgissbkh authored a paper 2 months ago

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

View all activity

Papers

When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

View all Papers

Organization Card

Community About org cards

Scale or Reason?

Distilling reasoning traces from strong teacher models has become the standard recipe for building capable small language models. Yet reasoning traces are 5-20x longer than standard instruction fine-tuning (IFT) outputs, meaning every practitioner who chooses reasoning distillation implicitly forgoes training a larger IFT model on the same compute budget. Whether this trade-off is worthwhile remains unaddressed. We study it with a controlled experiment: a single teacher generates paired IFT and reasoning outputs for identical prompts by toggling only its reasoning mode, isolating supervision format as the sole variable. Training students at five scales (0.5B to 14B) and evaluating on 18 benchmarks, we find that at matched FLOPs, IFT lies on or near the Pareto frontier across the majority of configurations. Reasoning reaches the Pareto frontier only on open-ended tasks at 7B and above. Even there, a sequential curriculum mixing just 25-50% reasoning data with IFT captures most of the accuracy benefit at far lower compute cost.

If you use resources ↓ bellow ↓ in your work, please cite: Scale or Reason?

@misc{boizard2025doesreasoningmattercontrolled,
      title={When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance}, 
      author={Nicolas Boizard and Hippolyte Gisserot-Boukhlef and Kevin El-Haddad and Céline Hudelot and Pierre Colombo},
      year={2025},
      eprint={2509.22193},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.22193}, 
}

Collections 3

View 3 collections

models 59

datasets 2

Scale-or-Reason/math-reasoning-ift-pairs

Viewer • Updated Oct 27, 2025 • 458k • 215 • 8

Scale-or-Reason/general-reasoning-ift-pairs

Viewer • Updated Sep 29, 2025 • 2.97M • 78 • 4

Scale or Reason ?

AI & ML interests

Recent Activity

Papers

Scale or Reason?

Collections 3

Scale-or-Reason/general-reasoning-ift-pairs

Scale-or-Reason/Qwen2.5-14B-reasoning

Scale-or-Reason/Qwen2.5-7B-reasoning

Scale-or-Reason/Qwen2.5-3B-reasoning

Scale-or-Reason/general-reasoning-ift-pairs

Scale-or-Reason/Qwen2.5-14B-ift

Scale-or-Reason/Qwen2.5-7B-ift

Scale-or-Reason/Qwen2.5-3B-ift

Scale-or-Reason/general-reasoning-ift-pairs

Scale-or-Reason/Qwen2.5-14B-reasoning

Scale-or-Reason/Qwen2.5-7B-reasoning

Scale-or-Reason/Qwen2.5-3B-reasoning

Scale-or-Reason/general-reasoning-ift-pairs

Scale-or-Reason/Qwen2.5-14B-ift

Scale-or-Reason/Qwen2.5-7B-ift

Scale-or-Reason/Qwen2.5-3B-ift

models 59

Scale-or-Reason/gemma3-4B_1_split

Scale-or-Reason/gemma3-4B_0_split

Scale-or-Reason/gemma3-4B_0.75_split

Scale-or-Reason/gemma3-4B_0.5_split

Scale-or-Reason/gemma3-4B_0.25_split

Scale-or-Reason/gemma3-1B_1_split

Scale-or-Reason/gemma3-1B_0_split

Scale-or-Reason/gemma3-12B_1_split

Scale-or-Reason/gemma3-12B_0_split

Scale-or-Reason/Qwen2.5-7B-0.75_split

datasets 2

Scale-or-Reason/math-reasoning-ift-pairs

Scale-or-Reason/general-reasoning-ift-pairs

AI & ML interests

Recent Activity

Papers

Team members 2

Scale or Reason?

Collections 3

models 59 Sort: Recently updated

datasets 2 Sort: Recently updated

models 59

datasets 2