gpt-oss-200b-goblin

gpt-oss-200b-goblin is an agentic coding model derived from GPT-OSS 120B.

Goblin expands the GPT-OSS 120B base with additional specialist MoE capacity for coding-agent workflows, repository work, SWE-style tasks, tool-using automation, and math-assisted reasoning.

This release continues the Goomba line with more SWE and sequential-agentic specialist capacity. It was trained on just two GPUs.

Overview

  • Base model: openai/gpt-oss-120b
  • Approx total parameters: 201B
  • Approx active parameters: 16.5B per token at top-k=16
  • Total expert rows: 224
  • Added specialist experts: 96
  • Format: MXFP4
  • Out-of-box active experts: top-k=16
  • Intended use: agentic coding, SWE-style workflows, repository exploration, tool-using automation, raw SWE coding, math-assisted coding
  • Status: research preview

Recommended vLLM

This model was primarily tested with vLLM using the GPT-OSS reasoning parser and OpenAI tool-call parser.

vllm serve /path/to/model \
  --served-model-name vllm/doobee \
  --tensor-parallel-size 2 \
  --max-model-len 60000 \
  --gpu-memory-utilization 0.88 \
  --enforce-eager \
  --trust-remote-code \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Recommended parameters:

  • num_experts_per_tok=16 is already set in config.json
  • tensor-parallel-size=2
  • max-model-len=60000
  • gpu-memory-utilization=0.88
  • reasoning-parser=openai_gptoss
  • tool-call-parser=openai
  • enable-auto-tool-choice

The config ships with both num_experts_per_tok=16 and experts_per_token=16, so runtimes that respect the model config should use top-k 16 automatically. If your runtime overrides or ignores those fields, pass this explicitly:

--hf-overrides '{"num_experts_per_tok": 16}'

Tool Calling

Goblin was primarily tested as an agentic coding model. Basic OpenAI-compatible tool calling is expected to work best with the vLLM GPT-OSS reasoning parser and OpenAI tool-call parser enabled.

Suggested temperatures:

  • 0.3 for steady coding-agent work
  • 0.5 for broader agentic exploration

Recommended range: 0.3-0.5.

For repository exploration tasks, use an agent prompt that asks the model to inspect subdirectories, identify entry points, and summarize the project structure rather than stopping after a single directory listing.

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
24
Safetensors
Model size
209B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-200b-goblin

Quantized
(109)
this model