Merged Qwen Fusion (MoE Hybrid)
Overview
This model is a custom merged Mixture-of-Experts (MoE) language model built from three high-capability Qwen-based systems:
- Carnice Qwen 3.6 MoE (35B A3B) [HF: samuelcardillo] — base architecture
- Qwen 3.5 Opus High Reasoning (40B) [HF: DavidAU] — dense reasoning anchor
- Qwen Coder Next (40B A3B) [HF: Johnblick187] — coding specialization
The goal of this merge is to combine:
- strong reasoning
- coding ability
- MoE scalability
into a single hybrid model.
Merge Method
The model was constructed via:
1. Layer-wise Weighted Fusion
- Early layers: higher weight on reasoning anchor
- Mid layers: balanced blend
- Deep layers: bias toward coder model
2. Expert Fusion (MoE)
- 82 experts were fused using cosine similarity
- Similar experts → blended
- Dissimilar experts → replaced (coder-biased)
This avoids destructive averaging and preserves specialization.
3. Streaming Merge Pipeline
- Tensor-level streaming (no full model load)
- Safetensors-based merging
- Sharded output for large-scale handling
Post-processing
After merging, the model underwent:
Refusal Ablation (experimental; not saved or commited)
- Applied across most transformer layers
- ~19k+ weight matrices modified
- Partial attention projection failure due to architecture mismatch
⚠️ This step was experimental (also not retained in this version) and may affect stability.
Intended Use
This model is designed for:
- reasoning tasks
- coding assistance
- general text generation
- experimentation with MoE fusion
Limitations
Not an official architecture — hybrid + experimental
Partial incompatibility with some tooling (custom Qwen MoE layers)
May produce:
- reasoning-style outputs (
<think> blocks)
- inconsistent formatting
- occasional instability
Ablation may degrade attention behavior