Instructions to use Johnblick187/SmartCoderMoE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Johnblick187/SmartCoderMoE with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Johnblick187/SmartCoderMoE", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Johnblick187/SmartCoderMoE", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Johnblick187/SmartCoderMoE with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Johnblick187/SmartCoderMoE"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Johnblick187/SmartCoderMoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Johnblick187/SmartCoderMoE

SGLang

How to use Johnblick187/SmartCoderMoE with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Johnblick187/SmartCoderMoE" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Johnblick187/SmartCoderMoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Johnblick187/SmartCoderMoE" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Johnblick187/SmartCoderMoE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Johnblick187/SmartCoderMoE with Docker Model Runner:
```
docker model run hf.co/Johnblick187/SmartCoderMoE
```

SmartCoderMoE ☠️

“He probably could smoke you.” — King Tweak to Claude Sonnet 4.6, May 2026

SmartCoderMoE is a 4.65B parameter sparse Mixture-of-Experts coding model.

Architecture

SmartCoderMoE is not your average fine-tune. He was engineered through a multi-stage weight surgery pipeline:

Slice Merge — StarCoder2-15B and StarChat2-15B were each sliced into 3 × 2048-dim pieces and SLERP-merged with deliberate per-slice biases (60/80/90) to preserve coding depth while injecting instruct capability of Starchat2
MoE Surgery — Every dense FFN layer was surgically split: The original dim of 24576 was reduced to an intermediate dim of 8192 and kept as a dense FFN, and the remaining 16384 dims were sliced into 32 experts of 512 dim each, giving Smartcoder an expansive yet tiny network of 1280 total experts.
Vocab Expansion — Extended from 49152 to 65536 tokens with multimodal special tokens for code, audio, image, video, and music.
Zero waste — Not a single weight was discarded. Every parameter from StarCoder2’s original FFN lives on in either the dense FFN or one of the 1280 expert slots.

Numbers

Property	Value
Total parameters	4.65B
Active parameters per token	~2.1B
Total experts	1280
Experts per layer	32
Expert dim	512
Hidden size	2048
Layers	40
Vocab size	65536
Context length	16384

For reference: DeepSeek V3 has 256 experts. Kimi K2 has 384. Qwen3 Coder Next has 512. SmartCoderMoE has 1280. He has a different expert for every day of the next decade, with 66 days to spare.

Lineage

bigcode/starcoder2-15b ──┐
                          ├── 3x2048 slice merge (SLERP 60/80/90) ──> UnidentifiedSAUCE ──> SmartCoderMoE ☠️
HuggingFaceH4/starchat2-15b-sft-v0.1 ──┘

Planned Multimodal Extensions

SmartCoderMoE’s 2048 hidden size was chosen to natively align with:

Dasheng-1.2B (audio encoder, 2048 hidden) — zero projection needed
Qwen3-Omni Talker (audio decoder, 2048 hidden) — zero projection needed
Janus Pro (vision in + image out, 2048 hidden)
code2wav (code → music pipeline)

Intended Use

Coding. Lots of it. Uncensored.

Note from the Creator

As of the writing of this model card (Thursday, May 21st, 2026), the model is not finished. Multimodal expansion, as mentioned before, is on the way. As is a very unique calculation of how much of the original Starcoder knowledge remains. i will update the repo as i go. Feel free to use it while i build on it if you desire, and if you decide to do this and encounter any sort of issues woth it, please let me know so that i can fix it asap!

Downloads last month: 217

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for Johnblick187/SmartCoderMoE

Base model

Johnblick187/UnidentifiedSAUCE

Finetuned

(1)

this model