How to use from
Docker Model Runner
docker model run hf.co/Johnblick187/SmartCoderMoE
Quick Links

SmartCoderMoE ☠️

“He probably could smoke you.” — King Tweak to Claude Sonnet 4.6, May 2026

SmartCoderMoE is a 4.65B parameter sparse Mixture-of-Experts coding model.

Architecture

SmartCoderMoE is not your average fine-tune. He was engineered through a multi-stage weight surgery pipeline:

  1. Slice Merge — StarCoder2-15B and StarChat2-15B were each sliced into 3 × 2048-dim pieces and SLERP-merged with deliberate per-slice biases (60/80/90) to preserve coding depth while injecting instruct capability of Starchat2
  2. MoE Surgery — Every dense FFN layer was surgically split: The original dim of 24576 was reduced to an intermediate dim of 8192 and kept as a dense FFN, and the remaining 16384 dims were sliced into 32 experts of 512 dim each, giving Smartcoder an expansive yet tiny network of 1280 total experts.
  3. Vocab Expansion — Extended from 49152 to 65536 tokens with multimodal special tokens for code, audio, image, video, and music.
  4. Zero waste — Not a single weight was discarded. Every parameter from StarCoder2’s original FFN lives on in either the dense FFN or one of the 1280 expert slots.

Numbers

Property Value
Total parameters 4.65B
Active parameters per token ~2.1B
Total experts 1280
Experts per layer 32
Expert dim 512
Hidden size 2048
Layers 40
Vocab size 65536
Context length 16384

For reference: DeepSeek V3 has 256 experts. Kimi K2 has 384. Qwen3 Coder Next has 512. SmartCoderMoE has 1280. He has a different expert for every day of the next decade, with 66 days to spare.

Lineage

bigcode/starcoder2-15b ──┐
                          ├── 3x2048 slice merge (SLERP 60/80/90) ──> UnidentifiedSAUCE ──> SmartCoderMoE ☠️
HuggingFaceH4/starchat2-15b-sft-v0.1 ──┘

Planned Multimodal Extensions

SmartCoderMoE’s 2048 hidden size was chosen to natively align with:

  • Dasheng-1.2B (audio encoder, 2048 hidden) — zero projection needed
  • Qwen3-Omni Talker (audio decoder, 2048 hidden) — zero projection needed
  • Janus Pro (vision in + image out, 2048 hidden)
  • code2wav (code → music pipeline)

Intended Use

Coding. Lots of it. Uncensored.

Note from the Creator

As of the writing of this model card (Thursday, May 21st, 2026), the model is not finished. Multimodal expansion, as mentioned before, is on the way. As is a very unique calculation of how much of the original Starcoder knowledge remains. i will update the repo as i go. Feel free to use it while i build on it if you desire, and if you decide to do this and encounter any sort of issues woth it, please let me know so that i can fix it asap!

Downloads last month
217
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Johnblick187/SmartCoderMoE

Finetuned
(1)
this model