Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
qwen3.5
code
agent
sft
omnicoder
tesslate
conversational
Eval Results (legacy)
Instructions to use iffrce/OmniCoder-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iffrce/OmniCoder-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iffrce/OmniCoder-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("iffrce/OmniCoder-9B") model = AutoModelForImageTextToText.from_pretrained("iffrce/OmniCoder-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use iffrce/OmniCoder-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iffrce/OmniCoder-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iffrce/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iffrce/OmniCoder-9B
- SGLang
How to use iffrce/OmniCoder-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iffrce/OmniCoder-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iffrce/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iffrce/OmniCoder-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iffrce/OmniCoder-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use iffrce/OmniCoder-9B with Docker Model Runner:
docker model run hf.co/iffrce/OmniCoder-9B
| library_name: transformers | |
| base_model: Qwen/Qwen3.5-9B | |
| tags: | |
| - qwen3.5 | |
| - code | |
| - agent | |
| - sft | |
| - omnicoder | |
| - tesslate | |
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| model-index: | |
| - name: OmniCoder-9B | |
| results: | |
| - task: | |
| type: text-generation | |
| dataset: | |
| name: AIME 2025 | |
| type: custom | |
| metrics: | |
| - name: pass@5 | |
| type: accuracy | |
| value: 90.0 | |
| - task: | |
| type: text-generation | |
| dataset: | |
| name: GPQA Diamond | |
| type: custom | |
| metrics: | |
| - name: pass@1 | |
| type: accuracy | |
| value: 83.8 | |
| - name: pass@3 | |
| type: accuracy | |
| value: 86.4 | |
| - task: | |
| type: text-generation | |
| dataset: | |
| name: Terminal-Bench 2.0 | |
| type: custom | |
| metrics: | |
| - name: Pass Rate | |
| type: accuracy | |
| value: 28.1 | |
| <div align="center"> | |
| <img src="omnicoder-banner.png" alt="OmniCoder" width="720"> | |
| # OmniCoder-9B | |
| ### A 9B coding agent fine-tuned on 425K agentic trajectories. | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://huggingface.co/Qwen/Qwen3.5-9B) | |
| [](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF) | |
| !! 3/12/26 Update -> [Install For Your Coding Agents](https://tesslate.com/install#omnicoder) | |
| [Get Started](#quickstart) | [Benchmarks](#benchmarks) | [GGUF Downloads](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF) | |
| --- | |
| </div> | |
| ## Overview | |
| **OmniCoder-9B** is a 9-billion parameter coding agent model built by [Tesslate](https://tesslate.com), fine-tuned on top of [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)'s hybrid architecture (Gated Delta Networks interleaved with standard attention). It was trained on **425,000+ curated agentic coding trajectories** spanning real-world software engineering tasks, tool use, terminal operations, and multi-step reasoning. | |
| The training data was specifically built from **Claude Opus 4.6 agentic and coding reasoning traces**, targeting scaffolding patterns from Claude Code, OpenCode, Codex, and Droid. The dataset includes successful trajectories from models like Claude Opus 4.6, GPT-5.4, GPT-5.3-Codex, and Gemini 3.1 Pro. | |
| The model shows strong agentic behavior: it recovers from errors (read-before-write), responds to LSP diagnostics, and uses proper edit diffs instead of full rewrites. These patterns were learned directly from the real-world agent trajectories it was trained on. | |
| ### Key Features | |
| - **Trained on Frontier Agent Traces** : Built from Claude Opus 4.6, GPT-5.3-Codex, GPT-5.4, and Gemini 3.1 Pro agentic coding trajectories across Claude Code, OpenCode, Codex, and Droid scaffolding | |
| - **Hybrid Architecture** : Inherits Qwen3.5's Gated Delta Networks interleaved with standard attention for efficient long-context processing | |
| - **262K Native Context** : Full 262,144 token context window, extensible to 1M+ | |
| - **Error Recovery** : Learns read-before-write patterns, responds to LSP diagnostics, and applies minimal edit diffs instead of full rewrites | |
| - **Thinking Mode** : Supports `<think>...</think>` reasoning chains for complex problem decomposition | |
| - **Apache 2.0** : Fully open weights, no restrictions | |
| --- | |
| ## Benchmarks | |
| <div align="center"> | |
| | Benchmark | **OmniCoder-9B** | Qwen3.5-9B | Qwen3-Next-80B | GPT-OSS-120B | GPT-OSS-20B | GLM-4.7-Flash | GLM 4.7 | Claude Haiku 4.5 | | |
| |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | |
| | **AIME 2025** (pass@5) | 90 | | | | 91.7 | 91.6 | | | | |
| | **GPQA Diamond** (pass@1) | **83.8** | 81.7 | 77.2 | 80.1 | 71.5 | | | 73 | | |
| | **GPQA Diamond** (pass@3) | **86.4** | | | | | | | | | |
| | **Terminal-Bench 2.0** | **23.6** | 14.6 | | | | | 33.4 | 27 | | |
| </div> | |
| - **GPQA Diamond pass@1: 83.8%** (166/198). +2.1 points over the Qwen3.5-9B base model (81.7). At pass@3: **86.4** (171/198). | |
| - **AIME 2025 pass@5: 90%** (27/30). | |
| - **Terminal-Bench 2.0: 23.6%** (21/89). +8.99 points (+61% improvement) over the Qwen3.5-9B base model (14.6%, 13/89). | |
| --- | |
| ## Quickstart | |
| ### Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "Tesslate/OmniCoder-9B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful coding assistant."}, | |
| {"role": "user", "content": "Write a Python function to find the longest common subsequence of two strings."}, | |
| ] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20) | |
| print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ### vLLM | |
| ```bash | |
| vllm serve Tesslate/OmniCoder-9B --tensor-parallel-size 1 --max-model-len 65536 | |
| ``` | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="token") | |
| response = client.chat.completions.create( | |
| model="Tesslate/OmniCoder-9B", | |
| messages=[{"role": "user", "content": "Explain the difference between a mutex and a semaphore."}], | |
| temperature=0.6, | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| ### llama.cpp (GGUF) | |
| ```bash | |
| llama-cli --hf-repo Tesslate/OmniCoder-9B-GGUF --hf-file omnicoder-9b-q4_k_m.gguf -p "Your prompt" -c 8192 | |
| ``` | |
| All quantizations: [Tesslate/OmniCoder-9B-GGUF](https://huggingface.co/Tesslate/OmniCoder-9B-GGUF) | |
| --- | |
| ## Training Details | |
| | | | | |
| |:---|:---| | |
| | **Base Model** | [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) | | |
| | **Method** | LoRA SFT (r=64, alpha=32) | | |
| | **Dataset** | 425K agentic trajectories from 5 sources | | |
| | **Packing** | Sample packing with 99.35% efficiency | | |
| | **Hardware** | 4x NVIDIA H200 (DDP) | | |
| | **Framework** | Axolotl | | |
| | **Precision** | bf16 | | |
| | **Optimizer** | AdamW (lr=2e-4, cosine schedule) | | |
| --- | |
| ## Architecture | |
| OmniCoder inherits Qwen3.5-9B's hybrid architecture: | |
| - **Gated Delta Networks** : Linear attention layers interleaved with standard attention for efficient long-range dependencies | |
| - **VLM Backbone** : Built on `Qwen3_5ForConditionalGeneration` | |
| --- | |
| ## Recommended Sampling Parameters | |
| | Parameter | Value | | |
| |:---|:---| | |
| | Temperature | 0.6 | | |
| | Top-P | 0.95 | | |
| | Top-K | 20 | | |
| | Presence Penalty | 0.0 | | |
| For agentic / tool-calling tasks, consider lower temperature (0.2-0.4) for more deterministic behavior. | |
| --- | |
| ## Limitations | |
| - Performance on non-English tasks has not been extensively evaluated | |
| - Tool-calling format is flexible but works best with the scaffolding patterns seen in training | |
| --- | |
| ## Acknowledgments | |
| Special thanks to the [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) team and the discussion in [axolotl#3453](https://github.com/axolotl-ai-cloud/axolotl/issues/3453) for helping get Qwen3.5 packing support working. | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{omnicoder2025, | |
| title={OmniCoder-9B: A Frontier Open Coding Agent}, | |
| author={Tesslate}, | |
| year={2025}, | |
| url={https://huggingface.co/Tesslate/OmniCoder-9B} | |
| } | |
| ``` | |
| --- | |
| <div align="center"> | |
| **Built by [Tesslate](https://tesslate.com)** | |
| </div> | |