nekGPT

nekGPT is a small character-voice model derived from GPT-2 and tuned on authored X posts from @nekstoer.

This release intentionally uses the best validation checkpoint from training rather than the final checkpoint. Validation loss was lowest at epoch 2, then degraded with additional training:

epoch 1: 0.7919
epoch 2: 0.7898 <- selected checkpoint
epoch 3: 0.8061
epoch 4: 0.8148
epoch 5: 0.8266

Description

The goal was not generic assistant quality. The goal was to force a small model to latch onto a narrow posting voice strongly enough to survive open-ended prompts better than a larger, more stubborn base model.

The result is more voice-dominant than the larger Qwen experiments, but also less stable and less coherent over long conversations. It works best as a compact style model and experimental chat toy, not as a reliable assistant.

Methodology

Corpus

Source account: @nekstoer
Collection method: GetXAPI scrape of authored posts only
Reposts: excluded
Replies: included
Quotes: included
Raw authored posts collected: 1526

Dataset shaping

The original scrape was converted into higher-signal supervised examples:

standalone posts: write a tweet -> post
replies: reply to this post from @user: ... -> reply
quotes: quote tweet this post: ... -> quote tweet

To make the voice distribution more pronounced, a subset of standout standalone posts was repeated in the training split.

Effective dataset sizes for the GPT-2 run:

train: 1786
valid: 20
test: 20

Training

base model: openai-community/gpt2
hardware: Apple Silicon MPS
max length: 256
epochs: 5
learning rate: 5e-5
batch size: 4
gradient accumulation: 1

Although the full run completed, this repo publishes the epoch-2 checkpoint because it had the best validation loss.

Known behavior

Strengths:

strongly picks up local tone and phrasing
responds less like a default assistant than the larger Qwen run

Weaknesses:

drifts or collapses in longer multi-turn chats
can become incoherent
can overfit to fragments, usernames, and recurring motifs from the corpus

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "potteryrage/nekGPT"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

prompt = "User: hi nek\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=48,
    do_sample=True,
    temperature=0.9,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

License

Apache-2.0

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for potteryrage/nekGPT

Base model

openai-community/gpt2

Finetuned

(2173)

this model