Instructions to use Johnblick187/SmartCoderMoE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Johnblick187/SmartCoderMoE with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Johnblick187/SmartCoderMoE", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Johnblick187/SmartCoderMoE", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Johnblick187/SmartCoderMoE with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Johnblick187/SmartCoderMoE" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Johnblick187/SmartCoderMoE", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Johnblick187/SmartCoderMoE
- SGLang
How to use Johnblick187/SmartCoderMoE with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Johnblick187/SmartCoderMoE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Johnblick187/SmartCoderMoE", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Johnblick187/SmartCoderMoE" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Johnblick187/SmartCoderMoE", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Johnblick187/SmartCoderMoE with Docker Model Runner:
docker model run hf.co/Johnblick187/SmartCoderMoE
Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Johnblick187/SmartCoderMoE" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Johnblick187/SmartCoderMoE",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'SmartCoderMoE ☠️
“He probably could smoke you.” — King Tweak to Claude Sonnet 4.6, May 2026
SmartCoderMoE is a 4.65B parameter sparse Mixture-of-Experts coding model.
Architecture
SmartCoderMoE is not your average fine-tune. He was engineered through a multi-stage weight surgery pipeline:
- Slice Merge — StarCoder2-15B and StarChat2-15B were each sliced into 3 × 2048-dim pieces and SLERP-merged with deliberate per-slice biases (60/80/90) to preserve coding depth while injecting instruct capability of Starchat2
- MoE Surgery — Every dense FFN layer was surgically split: The original dim of 24576 was reduced to an intermediate dim of 8192 and kept as a dense FFN, and the remaining 16384 dims were sliced into 32 experts of 512 dim each, giving Smartcoder an expansive yet tiny network of 1280 total experts.
- Vocab Expansion — Extended from 49152 to 65536 tokens with multimodal special tokens for code, audio, image, video, and music.
- Zero waste — Not a single weight was discarded. Every parameter from StarCoder2’s original FFN lives on in either the dense FFN or one of the 1280 expert slots.
Numbers
| Property | Value |
|---|---|
| Total parameters | 4.65B |
| Active parameters per token | ~2.1B |
| Total experts | 1280 |
| Experts per layer | 32 |
| Expert dim | 512 |
| Hidden size | 2048 |
| Layers | 40 |
| Vocab size | 65536 |
| Context length | 16384 |
For reference: DeepSeek V3 has 256 experts. Kimi K2 has 384. Qwen3 Coder Next has 512. SmartCoderMoE has 1280. He has a different expert for every day of the next decade, with 66 days to spare.
Lineage
bigcode/starcoder2-15b ──┐
├── 3x2048 slice merge (SLERP 60/80/90) ──> UnidentifiedSAUCE ──> SmartCoderMoE ☠️
HuggingFaceH4/starchat2-15b-sft-v0.1 ──┘
Planned Multimodal Extensions
SmartCoderMoE’s 2048 hidden size was chosen to natively align with:
- Dasheng-1.2B (audio encoder, 2048 hidden) — zero projection needed
- Qwen3-Omni Talker (audio decoder, 2048 hidden) — zero projection needed
- Janus Pro (vision in + image out, 2048 hidden)
- code2wav (code → music pipeline)
Intended Use
Coding. Lots of it. Uncensored.
Note from the Creator
As of the writing of this model card (Thursday, May 21st, 2026), the model is not finished. Multimodal expansion, as mentioned before, is on the way. As is a very unique calculation of how much of the original Starcoder knowledge remains. i will update the repo as i go. Feel free to use it while i build on it if you desire, and if you decide to do this and encounter any sort of issues woth it, please let me know so that i can fix it asap!
- Downloads last month
- 217
Model tree for Johnblick187/SmartCoderMoE
Base model
Johnblick187/UnidentifiedSAUCE
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Johnblick187/SmartCoderMoE" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Johnblick187/SmartCoderMoE", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'