Fourth GPT

A tiny (344K parameter) character-level GPT trained for casual conversation.

Model Details

Property Value
Parameters 344,256
Architecture Decoder-only Transformer
Layers 3
Embedding Dim 96
Attention Heads 6
Context Window 64 characters
Vocabulary 29 (a-z, space, pipe, BOS)
Tokenization Character-level
Framework PyTorch

Architecture

  • 3 Transformer blocks with RMS normalization
  • Multi-head causal self-attention (6 heads, 16-dim each)
  • MLP with ReLU activation (4x expansion)
  • Learned positional embeddings
  • Weight tying not used

Training

  • Data: ~3,500 conversational prompt-response pairs
  • Format: prompt|response with | as turn separator
  • Optimizer: Adam with linear LR decay
  • Learning Rate: 1e-3
  • Steps: 18,000
  • Batch Size: 16
  • Hardware: Apple M1 GPU via MLX (converted to PyTorch for serving)

Usage

import torch
from model import FourthModel

model = FourthModel()
model.load()
response = model.generate("hello")
print(response)  # "hi there friend"

API

An OpenAI-compatible API is available as a Hugging Face Space:

curl https://ajaxdavis-fourth-gpt-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fourth-gpt","messages":[{"role":"user","content":"hello"}]}'

Limitations

  • Character-level tokenization limits vocabulary to lowercase English
  • 64-character context window constrains response length
  • Small model size means memorization of training data rather than broad generalization
  • Best on seen prompt patterns (greetings, jokes, wisdom, recommendations)

License

MIT

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support