synth-gpt-110m
A 110M parameter GPT-style language model pretrained on the PleIAs/SYNTH synthetic reasoning dataset.
Note: This is a base pretrained model only. It has not been instruction-tuned or aligned.
Model Details
- Architecture: GPT (decoder-only transformer)
- Parameters: 109.7M
- Context Length: 1024 tokens
- Vocabulary Size: 32,256 (T5 tokenizer + special tokens)
- Training Data: 30B tokens from PleIAs/SYNTH
- Precision: bfloat16
Architecture Config
| Parameter | Value |
|---|---|
| Layers | 12 |
| Heads | 12 |
| Embedding Dim | 768 |
| Head Dim | 64 |
Usage
import json
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_model
from model import GPT, GPTConfig
from tokenizer import t5
# Load tokenizer
tok, bos_id, eos_id = t5()
# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
config_path = hf_hub_download("ethanthoma/synth-gpt-110m", "config.json")
model_path = hf_hub_download("ethanthoma/synth-gpt-110m", "model.safetensors")
with open(config_path) as f:
config = GPTConfig(**json.load(f))
model = GPT(config)
load_model(model, model_path, device=device)
model.eval()
# Generate
def generate(prompt, max_tokens=100, temperature=0.8):
tokens = tok.encode(prompt).ids
idx = torch.tensor([tokens], device=device)
with torch.no_grad():
for _ in range(max_tokens):
logits, _ = model(idx[:, -1024:])
logits = logits[:, -1, :] / temperature
probs = torch.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
idx = torch.cat([idx, next_token], dim=1)
if next_token.item() == eos_id:
break
return tok.decode(idx[0].tolist())
print(generate("What is 2 + 2?"))
Training
- Optimizer: Muon (MomentUm Orthogonalized by Newton-schulz)
- Learning Rate: 0.02 with warmup and cosine decay
- Batch Size: 512
- Hardware: NVIDIA H100
Limitations
- Pretrained only - does not follow instructions or engage in dialogue
- Trained on synthetic reasoning data, outputs follow that format
- May produce incorrect, nonsensical, or repetitive outputs
- Not suitable for production use without further training
License
MIT
- Downloads last month
- 22