synth-gpt-110m

A 110M parameter GPT-style language model pretrained on the PleIAs/SYNTH synthetic reasoning dataset.

Note: This is a base pretrained model only. It has not been instruction-tuned or aligned.

Model Details

  • Architecture: GPT (decoder-only transformer)
  • Parameters: 109.7M
  • Context Length: 1024 tokens
  • Vocabulary Size: 32,256 (T5 tokenizer + special tokens)
  • Training Data: 30B tokens from PleIAs/SYNTH
  • Precision: bfloat16

Architecture Config

Parameter Value
Layers 12
Heads 12
Embedding Dim 768
Head Dim 64

Usage

import json
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_model

from model import GPT, GPTConfig
from tokenizer import t5

# Load tokenizer
tok, bos_id, eos_id = t5()

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
config_path = hf_hub_download("ethanthoma/synth-gpt-110m", "config.json")
model_path = hf_hub_download("ethanthoma/synth-gpt-110m", "model.safetensors")

with open(config_path) as f:
    config = GPTConfig(**json.load(f))

model = GPT(config)
load_model(model, model_path, device=device)
model.eval()

# Generate
def generate(prompt, max_tokens=100, temperature=0.8):
    tokens = tok.encode(prompt).ids
    idx = torch.tensor([tokens], device=device)

    with torch.no_grad():
        for _ in range(max_tokens):
            logits, _ = model(idx[:, -1024:])
            logits = logits[:, -1, :] / temperature
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            idx = torch.cat([idx, next_token], dim=1)
            if next_token.item() == eos_id:
                break

    return tok.decode(idx[0].tolist())

print(generate("What is 2 + 2?"))

Training

  • Optimizer: Muon (MomentUm Orthogonalized by Newton-schulz)
  • Learning Rate: 0.02 with warmup and cosine decay
  • Batch Size: 512
  • Hardware: NVIDIA H100

Limitations

  • Pretrained only - does not follow instructions or engage in dialogue
  • Trained on synthetic reasoning data, outputs follow that format
  • May produce incorrect, nonsensical, or repetitive outputs
  • Not suitable for production use without further training

License

MIT

Downloads last month
22
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ethanthoma/synth-gpt-110m