Fourth GPT

A tiny (344K parameter) character-level GPT trained for casual conversation.

Model Details

Property	Value
Parameters	344,256
Architecture	Decoder-only Transformer
Layers	3
Embedding Dim	96
Attention Heads	6
Context Window	64 characters
Vocabulary	29 (a-z, space, pipe, BOS)
Tokenization	Character-level
Framework	PyTorch

Architecture

3 Transformer blocks with RMS normalization
Multi-head causal self-attention (6 heads, 16-dim each)
MLP with ReLU activation (4x expansion)
Learned positional embeddings
Weight tying not used

Training

Data: ~3,500 conversational prompt-response pairs
Format: prompt|response with | as turn separator
Optimizer: Adam with linear LR decay
Learning Rate: 1e-3
Steps: 18,000
Batch Size: 16
Hardware: Apple M1 GPU via MLX (converted to PyTorch for serving)

Usage

import torch
from model import FourthModel

model = FourthModel()
model.load()
response = model.generate("hello")
print(response)  # "hi there friend"

API

An OpenAI-compatible API is available as a Hugging Face Space:

curl https://ajaxdavis-fourth-gpt-api.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fourth-gpt","messages":[{"role":"user","content":"hello"}]}'

Limitations

Character-level tokenization limits vocabulary to lowercase English
64-character context window constrains response length
Small model size means memorization of training data rather than broad generalization
Best on seen prompt patterns (greetings, jokes, wisdom, recommendations)

License

MIT

Downloads last month: 5