Fourth GPT
A tiny (344K parameter) character-level GPT trained for casual conversation.
Model Details
| Property | Value |
|---|---|
| Parameters | 344,256 |
| Architecture | Decoder-only Transformer |
| Layers | 3 |
| Embedding Dim | 96 |
| Attention Heads | 6 |
| Context Window | 64 characters |
| Vocabulary | 29 (a-z, space, pipe, BOS) |
| Tokenization | Character-level |
| Framework | PyTorch |
Architecture
- 3 Transformer blocks with RMS normalization
- Multi-head causal self-attention (6 heads, 16-dim each)
- MLP with ReLU activation (4x expansion)
- Learned positional embeddings
- Weight tying not used
Training
- Data: ~3,500 conversational prompt-response pairs
- Format:
prompt|responsewith|as turn separator - Optimizer: Adam with linear LR decay
- Learning Rate: 1e-3
- Steps: 18,000
- Batch Size: 16
- Hardware: Apple M1 GPU via MLX (converted to PyTorch for serving)
Usage
import torch
from model import FourthModel
model = FourthModel()
model.load()
response = model.generate("hello")
print(response) # "hi there friend"
API
An OpenAI-compatible API is available as a Hugging Face Space:
curl https://ajaxdavis-fourth-gpt-api.hf.space/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"fourth-gpt","messages":[{"role":"user","content":"hello"}]}'
Limitations
- Character-level tokenization limits vocabulary to lowercase English
- 64-character context window constrains response length
- Small model size means memorization of training data rather than broad generalization
- Best on seen prompt patterns (greetings, jokes, wisdom, recommendations)
License
MIT
- Downloads last month
- 2