gpt-oss-140b Ren-2

gpt-oss-140b, codename Ren-2, is an agentic / SWE-oriented derivative of OpenAI GPT-OSS 120B.

This release takes the 120B base model and adds roughly 20B more parameters oriented toward agentic coding and SWE-style behavior.

Overview

  • Base model: openai/gpt-oss-120b
  • Release name: gpt-oss-140b Ren-2
  • Format: MXFP4
  • Intended use: coding, agentic coding, SWE-style assistant workflows
  • Status: research preview

Ren-2 is meant to feel like a more agentic version of GPT-OSS 120B rather than a generic continuation of the base checkpoint.

Training

  • Built on a custom framework
  • Roughly 3 hours of pre-training / post-training work for this release path
  • Expanded from the 120B base with roughly 20B additional parameters

This is an iterative open release. More sizes and follow-up revisions will come later.

Inference

This model was tested with:

  • vLLM 0.19.0

Recommended serving settings:

  • num_experts_per_tok=12
  • --reasoning-parser openai_gptoss
  • --tool-call-parser openai
  • --enable-auto-tool-choice

The original base setup used 4 active experts per token. Ren-2 is intended to run at 12 active experts per token.

Rough active-equivalent compute:

  • original top-k=4: about 5.7B active-equivalent params
  • Ren-2 top-k=12: about 12.9B active-equivalent params

These are approximate active-equivalent numbers, not total parameter counts.

In internal agentic-task traces at top-k=12, roughly half of active routing traffic ran through the added 20B expansion. Observed new-expert usage in those traces was about 48.6% of active expert selections and about 46.2% of routing mass. This is workload-dependent.

This release is intended to run directly from the baked model shards. No extra router merge step is required at inference time.

What It Is Good At

  • coding
  • agentic coding
  • SWE-style assistant behavior
  • practical tool-using workflows

Ren-2 is intended to be usable for production-style coding and agentic workflows, including terminal coding agents, SWE assistants, and tool-using automation setups.

Feedback

Useful feedback includes:

  • coding quality
  • tool use quality
  • long-context behavior
  • inference stability
  • preferred smaller sizes / VRAM targets

If you want smaller custom models, reach out with the hardware target and the kind of feedback you can provide.

It can be a different size or architecture, as long as the feedback loop is useful.

Included Files

  • config.json
  • generation_config.json
  • tokenizer.json
  • tokenizer_config.json
  • chat_template.jinja
  • model.safetensors.index.json
  • model-*.safetensors
  • README.md

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
26
Safetensors
Model size
143B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-140b-ren-2

Quantized
(109)
this model