Model Card for GPT2-Chat (Fine-tuned)

This is a fine-tuned version of GPT-2 adapted for chat-style generation.
It was trained on conversational data to make GPT-2 behave more like ChatGPT, giving more interactive, coherent, and context-aware responses.

Model Details

Model Description

Developed by: Faijan Khan
Shared by: faizack
Model type: Causal Language Model (decoder-only transformer)
Language(s): English
License: MIT (or same as GPT-2)
Finetuned from: gpt2

Model Sources

Repository: https://huggingface.co/faizack/gpt2-chat-ft
Paper [GPT-2 original]: Language Models are Unsupervised Multitask Learners

Uses

Direct Use

Conversational AI experiments
Chatbot prototyping
Educational or research purposes

Downstream Use

Further fine-tuning for domain-specific dialogue (e.g., customer support, tutoring, storytelling).

Out-of-Scope Use

Not intended for production use without additional safety layers.
Not suitable for sensitive domains like medical, legal, or financial advice.

Bias, Risks, and Limitations

May generate biased, offensive, or factually incorrect responses (inherited from GPT-2).
Not aligned with RLHF like ChatGPT, so safety guardrails are minimal.

Recommendations

Use with human oversight.
Add filtering, moderation, or reinforcement learning with human feedback (RLHF) if deploying in production.

How to Get Started with the Model

from transformers import pipeline

chatbot = pipeline("text-generation", model="faizack/gpt2-chat-ft")

prompt = "Hello, how are you?"
response = chatbot(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)
print(response[0]["generated_text"])

Training Details

Training Data

Fine-tuned on conversational datasets (prompt → response pairs).

Training Procedure

Base model: gpt2
Objective: Causal LM (next token prediction).
Mixed precision: fp16 training.
Optimizer: AdamW.

Training Hyperparameters

Learning rate: 5e-5
Batch size: 4
Epochs: 3
Warmup steps: 500

Evaluation

Metrics

Perplexity (PPL) for fluency.
Manual qualitative evaluation for coherence.

Results

Lower perplexity on conversational prompts compared to base GPT-2.
Produces more context-aware and fluent chat responses.

Environmental Impact

Hardware Type: NVIDIA A100 (40GB)
Training time: ~2 hours
Cloud Provider: Vast.ai (example)
Carbon Emitted: Estimated <10 kg CO2eq

Technical Specifications

Model Architecture

Transformer decoder-only (117M parameters).
Context length: 1024 tokens.

Compute Infrastructure

Hardware: 1x NVIDIA A100
Software: PyTorch, Hugging Face Transformers, Accelerate.

Citation

If you use this model, please cite GPT-2 and this fine-tuned version:

BibTeX:

@misc{faizack2025gpt2chat,
  author = {Faijan Khan},
  title = {GPT2-Chat Fine-tuned Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/faizack/gpt2-chat-ft}}
}

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32