Synapse-3B

Small models that think together. And learn.

Synapse-3B is a merged specialist model created by TITAN Synapse — an open-source Rust inference engine that runs a swarm of tiny specialist models that collaborate and learn continuously on your GPU.

This model combines 4 specialist LoRA adapters (math, code, general, coordinator) trained on curated datasets, then merged into a single model using TIES merging (Trim, Elect Sign, Merge) for minimal interference between specializations.

Key Features

  • 4 specialist domains merged into one model without catastrophic forgetting
  • TIES merging — trims small deltas, elects signs by majority vote, merges only agreeing directions
  • Based on Qwen2.5-3B-Instruct — strong Apache 2.0 base with multilingual support
  • Part of the Synapse ecosystem — designed for the brain-inspired Synapse Architecture (Mamba + xLSTM + Sparse MoE + Fast Weights)

How This Model Was Made

Base Model: Qwen/Qwen2.5-3B-Instruct (Apache 2.0)
     |
     +---> QLoRA (rank 64) ---> Math Specialist (GSM8K + OpenWebMath + Orca-Math, 50k samples)
     +---> QLoRA (rank 64) ---> Code Specialist (CodeAlpaca + Evol-Instruct + Python-18k, 50k samples)
     +---> QLoRA (rank 64) ---> General Specialist (SlimOrca + Alpaca-Cleaned, 50k samples)
     +---> QLoRA (rank 32) ---> Coordinator (Synthetic routing, 5k samples)
     |
     +---> TIES Merge (trim 80%, sign election, agreement merge)
     |
     = Synapse-3B

Specialist Details

Specialist Datasets Samples LoRA Rank Focus
Math GSM8K, OpenWebMath, Orca-Math 50,000 64 Mathematical reasoning, step-by-step problem solving
Code CodeAlpaca-20k, Evol-Instruct-Code-80k, Python-18k 50,000 64 Code generation, debugging, Python expertise
General SlimOrca, Alpaca-Cleaned 50,000 64 General knowledge, instruction following, reasoning
Coordinator Synthetic routing examples 5,000 32 Task analysis, specialist routing, swarm coordination

Merge Method: TIES

TIES (Trim, Elect Sign, Merge) is used to combine adapters with minimal interference:

  1. Trim — Remove small-magnitude deltas (keep top 20% per parameter)
  2. Elect Sign — For each parameter, take a majority vote on the sign direction across all specialists
  3. Merge — Only average deltas that agree with the elected sign

This produces cleaner merges than simple averaging, preserving each specialist's strengths.

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("djtony707/synapse-3b")
tokenizer = AutoTokenizer.from_pretrained("djtony707/synapse-3b")

messages = [{"role": "user", "content": "Solve: If a train travels 120km in 2 hours, what is its speed in m/s?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With TITAN Synapse Engine (Rust, local inference)

# Install
curl -sSL https://raw.githubusercontent.com/Djtony707/titan-synapse/main/install.sh | bash

# Pull and run
synapse pull synapse-3b
synapse up

# OpenAI-compatible API on localhost:6900
curl http://localhost:6900/v1/chat/completions \
  -d '{"model":"synapse-3b","messages":[{"role":"user","content":"Hello!"}]}'

The Synapse Architecture (v1.0 Target)

Synapse-3B is the foundation for the Synapse Architecture — a brain-inspired modular model that replaces monolithic transformers:

                    THALAMUS (Mamba Router, O(n))
                         |
          +--------------+--------------+
          |              |              |
     xLSTM Lang    Sparse MoE     Fast-Weight
      Module       Expert Pool      Memory
      O(n)        top-k of 8+     Learn during
     syntax,      specialists     inference,
     grammar      activate        no backprop
  • No O(n^2) attention — Mamba (state-space) + xLSTM (recurrent)
  • Sparse activation — only 2-3 of 8+ modules fire per token
  • Fast-weight memory — learn new facts in ONE forward pass
  • Full observability — every routing decision is transparent, no black box

Training Details

  • Hardware: NVIDIA RTX 5090 (32GB VRAM)
  • Training framework: QLoRA via TRL SFTTrainer
  • Quantization: 4-bit NF4 (for training efficiency)
  • Learning rate: 2e-4 with cosine scheduler
  • Epochs: 3 per specialist
  • Batch size: 2 (gradient accumulation 8, effective batch 16)
  • Max sequence length: 2048 tokens
  • Training time: ~2 hours per specialist on RTX 5090
  • Merge method: TIES (trim ratio 0.8)
  • Created: March 21, 2026

Limitations

  • This is a 3B parameter model — it won't match 70B+ models on complex reasoning
  • Trained on English-focused datasets; multilingual performance inherited from Qwen base
  • The coordinator specialist is trained on synthetic routing data; real-world routing improves with use
  • Best used as part of the TITAN Synapse swarm (multiple specialists collaborating)

Citation

@misc{synapse3b2026,
  title={Synapse-3B: A Merged Specialist Model for the TITAN Synapse Engine},
  author={Tony Elliott},
  year={2026},
  url={https://huggingface.co/djtony707/synapse-3b},
  note={Created with TITAN Synapse — https://github.com/Djtony707/titan-synapse}
}

License

Apache 2.0 — use it for anything.

Built by Tony Elliott with TITAN Synapse.

Downloads last month
86
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for djtony707/synapse-3b

Base model

Qwen/Qwen2.5-3B
Finetuned
(1136)
this model
Quantizations
2 models

Datasets used to train djtony707/synapse-3b

Paper for djtony707/synapse-3b