modded-GPT-1

A small, modern rebuild of GPT-1 trained on WikiText-103.

Plain-English comparison infographic

Checkpoint

  • File: wikitext103-50m_final.pt
  • Parameters: 41.6M
  • Training: 20,000 steps, WikiText-103, Muon + torch.compile
  • Validation loss: 3.2998
  • Validation perplexity: 27.11

Local Benchmark Snapshot

These are local validation scores from the repo's GLUE fine-tuning script, compared with GPT-1 paper/test-set numbers. They are useful as a practical comparison, not as a leaderboard-equivalent reproduction.

Task GPT-1 Paper This Checkpoint Delta
RTE 56.0 62.1 +6.1
MRPC 82.3 82.1 -0.2
STS-B 82.0 79.7 -2.3
SST-2 91.3 86.9 -4.4
CoLA 45.4 11.8 -33.6

Loading

This is a custom PyTorch checkpoint. Use the model.py included in this model repo or the GitHub repository.

import torch
from model import GPT

ckpt = torch.load("wikitext103-50m_final.pt", map_location="cpu", weights_only=True)
model = GPT(ckpt["config"])
model.load_state_dict(ckpt["model"])
model.eval()

The tokenizer is included as tokenizer.json.

Notes

  • This model is much smaller than GPT-1: 41.6M params vs ~117M.
  • It was trained on WikiText-103, not BooksCorpus.
  • The CoLA grammar benchmark remains the clear weak spot.
  • The Hugging Face token used for upload should be rotated after publishing.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support