nekGPT

nekGPT is a small character-voice model derived from GPT-2 and tuned on authored X posts from @nekstoer.

This release intentionally uses the best validation checkpoint from training rather than the final checkpoint. Validation loss was lowest at epoch 2, then degraded with additional training:

  • epoch 1: 0.7919
  • epoch 2: 0.7898 <- selected checkpoint
  • epoch 3: 0.8061
  • epoch 4: 0.8148
  • epoch 5: 0.8266

Description

The goal was not generic assistant quality. The goal was to force a small model to latch onto a narrow posting voice strongly enough to survive open-ended prompts better than a larger, more stubborn base model.

The result is more voice-dominant than the larger Qwen experiments, but also less stable and less coherent over long conversations. It works best as a compact style model and experimental chat toy, not as a reliable assistant.

Methodology

Corpus

  • Source account: @nekstoer
  • Collection method: GetXAPI scrape of authored posts only
  • Reposts: excluded
  • Replies: included
  • Quotes: included
  • Raw authored posts collected: 1526

Dataset shaping

The original scrape was converted into higher-signal supervised examples:

  • standalone posts: write a tweet -> post
  • replies: reply to this post from @user: ... -> reply
  • quotes: quote tweet this post: ... -> quote tweet

To make the voice distribution more pronounced, a subset of standout standalone posts was repeated in the training split.

Effective dataset sizes for the GPT-2 run:

  • train: 1786
  • valid: 20
  • test: 20

Training

  • base model: openai-community/gpt2
  • hardware: Apple Silicon MPS
  • max length: 256
  • epochs: 5
  • learning rate: 5e-5
  • batch size: 4
  • gradient accumulation: 1

Although the full run completed, this repo publishes the epoch-2 checkpoint because it had the best validation loss.

Known behavior

Strengths:

  • strongly picks up local tone and phrasing
  • responds less like a default assistant than the larger Qwen run

Weaknesses:

  • drifts or collapses in longer multi-turn chats
  • can become incoherent
  • can overfit to fragments, usernames, and recurring motifs from the corpus

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "potteryrage/nekGPT"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)

prompt = "User: hi nek\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=48,
    do_sample=True,
    temperature=0.9,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

License

Apache-2.0

Downloads last month
353
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for potteryrage/nekGPT

Finetuned
(2224)
this model

Space using potteryrage/nekGPT 1