nekGPT
nekGPT is a small character-voice model derived from GPT-2 and tuned on authored X posts from @nekstoer.
This release intentionally uses the best validation checkpoint from training rather than the final checkpoint. Validation loss was lowest at epoch 2, then degraded with additional training:
- epoch 1:
0.7919 - epoch 2:
0.7898<- selected checkpoint - epoch 3:
0.8061 - epoch 4:
0.8148 - epoch 5:
0.8266
Description
The goal was not generic assistant quality. The goal was to force a small model to latch onto a narrow posting voice strongly enough to survive open-ended prompts better than a larger, more stubborn base model.
The result is more voice-dominant than the larger Qwen experiments, but also less stable and less coherent over long conversations. It works best as a compact style model and experimental chat toy, not as a reliable assistant.
Methodology
Corpus
- Source account:
@nekstoer - Collection method: GetXAPI scrape of authored posts only
- Reposts: excluded
- Replies: included
- Quotes: included
- Raw authored posts collected:
1526
Dataset shaping
The original scrape was converted into higher-signal supervised examples:
- standalone posts:
write a tweet -> post - replies:
reply to this post from @user: ... -> reply - quotes:
quote tweet this post: ... -> quote tweet
To make the voice distribution more pronounced, a subset of standout standalone posts was repeated in the training split.
Effective dataset sizes for the GPT-2 run:
- train:
1786 - valid:
20 - test:
20
Training
- base model:
openai-community/gpt2 - hardware: Apple Silicon MPS
- max length:
256 - epochs:
5 - learning rate:
5e-5 - batch size:
4 - gradient accumulation:
1
Although the full run completed, this repo publishes the epoch-2 checkpoint because it had the best validation loss.
Known behavior
Strengths:
- strongly picks up local tone and phrasing
- responds less like a default assistant than the larger Qwen run
Weaknesses:
- drifts or collapses in longer multi-turn chats
- can become incoherent
- can overfit to fragments, usernames, and recurring motifs from the corpus
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "potteryrage/nekGPT"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
prompt = "User: hi nek\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=48,
do_sample=True,
temperature=0.9,
top_p=0.95,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
License
Apache-2.0
- Downloads last month
- 353
Model tree for potteryrage/nekGPT
Base model
openai-community/gpt2