GPT-2.4: Advanced Fine-Tuned GPT-2

GPT-2.4 represents the most advanced iteration in this series of fine-tuned GPT-2 models. It is built upon the GPT-2.3-High architecture, incorporating significant improvements in training volume and sequence handling.

Technical Specifications

Base Model: GPT-2 (Small)
Architecture: Causal Language Modeling with expanded positional embeddings.
Context Window: 2048 Tokens (Upgraded from the default 1024).
Training Data: Wikitext-2-raw-v1 (25% subset - a 25% increase over GPT-2.3-High).
Epochs: 3
Optimizer: AdamW with a learning rate of 2e-5 and weight decay of 0.01.
Precision: Mixed precision (FP16) utilized during training on NVIDIA GPU.

Performance Metrics (Perplexity)

Perplexity (PPL) measures how well the probability distribution predicted by the model matches the actual distribution of the words in the evaluation data. Lower is better.

Training Subset Perplexity: 2.27
Official Unseen Test Set Perplexity: 4.04

Key Features

Extended Context: Supports twice the standard sequence length of GPT-2, making it suitable for long-form creative writing and document analysis.
Healing Weights: Specifically fine-tuned to resolve weight mismatch issues arising from manual configuration changes.
Optimized Inference: High coherence achieved through specific generation penalties.

Usage Example: Text Generation

Use this code to simply generate text using the model from the Hugging Face Hub:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

repo_id = "BikoRiko/GPT-2.4"
# ignore_mismatched_sizes=True is critical for the 2048-token configuration
model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)

input_text = "Artificial intelligence will shape the future by"
inputs = tokenizer(input_text, return_tensors='pt')

outputs = model.generate(
    **inputs,
    max_length=150,
    do_sample=True,
    top_p=0.95,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Usage Example: Verifying Evaluation Score (PPL)

Use this code to verify the model's accuracy (Perplexity) on a dataset split:

import math
from datasets import load_dataset
from transformers import Trainer, TrainingArguments, GPT2LMHeadModel, GPT2Tokenizer

repo_id = "BikoRiko/GPT-2.4"
model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)
tokenizer.pad_token = tokenizer.eos_token

# Load test split
test_data = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')

def tokenize(batch):
    res = tokenizer(batch['text'], truncation=True, padding='max_length', max_length=128)
    res['labels'] = res['input_ids'].copy()
    return res

tokenized_test = test_data.map(tokenize, batched=True)

trainer = Trainer(model=model, args=TrainingArguments(output_dir='./tmp_eval'))
eval_results = trainer.evaluate(tokenized_test)
ppl = math.exp(eval_results['eval_loss'])

print(f"Verified Test Perplexity: {ppl:.2f}")

Downloads last month: 21

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support