synth-gpt-110m

A 110M parameter GPT-style language model pretrained on the PleIAs/SYNTH synthetic reasoning dataset.

Note: This is a base pretrained model only. It has not been instruction-tuned or aligned.

Model Details

Architecture: GPT (decoder-only transformer)
Parameters: 109.7M
Context Length: 1024 tokens
Vocabulary Size: 32,256 (T5 tokenizer + special tokens)
Training Data: 30B tokens from PleIAs/SYNTH
Precision: bfloat16

Architecture Config

Parameter	Value
Layers	12
Heads	12
Embedding Dim	768
Head Dim	64

Usage

import json
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_model

from model import GPT, GPTConfig
from tokenizer import t5

# Load tokenizer
tok, bos_id, eos_id = t5()

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
config_path = hf_hub_download("ethanthoma/synth-gpt-110m", "config.json")
model_path = hf_hub_download("ethanthoma/synth-gpt-110m", "model.safetensors")

with open(config_path) as f:
    config = GPTConfig(**json.load(f))

model = GPT(config)
load_model(model, model_path, device=device)
model.eval()

# Generate
def generate(prompt, max_tokens=100, temperature=0.8):
    tokens = tok.encode(prompt).ids
    idx = torch.tensor([tokens], device=device)

    with torch.no_grad():
        for _ in range(max_tokens):
            logits, _ = model(idx[:, -1024:])
            logits = logits[:, -1, :] / temperature
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            idx = torch.cat([idx, next_token], dim=1)
            if next_token.item() == eos_id:
                break

    return tok.decode(idx[0].tolist())

print(generate("What is 2 + 2?"))

Training

Optimizer: Muon (MomentUm Orthogonalized by Newton-schulz)
Learning Rate: 0.02 with warmup and cosine decay
Batch Size: 512
Hardware: NVIDIA H100

Limitations

Pretrained only - does not follow instructions or engage in dialogue
Trained on synthetic reasoning data, outputs follow that format
May produce incorrect, nonsensical, or repetitive outputs
Not suitable for production use without further training

License

MIT

Downloads last month: 22

Safetensors

Model size

0.1B params

Tensor type

BF16

ethanthoma
/

synth-gpt-110m

synth-gpt-110m

Model Details

Architecture Config

Usage

Training

Limitations

License

Dataset used to train ethanthoma/synth-gpt-110m