Text Generation
Transformers
Safetensors
gpt2
causal-lm
fine-tuned
chatbot
text-generation-inference
Instructions to use faizack/gpt2-chat-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use faizack/gpt2-chat-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="faizack/gpt2-chat-ft")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("faizack/gpt2-chat-ft") model = AutoModelForCausalLM.from_pretrained("faizack/gpt2-chat-ft") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use faizack/gpt2-chat-ft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "faizack/gpt2-chat-ft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "faizack/gpt2-chat-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/faizack/gpt2-chat-ft
- SGLang
How to use faizack/gpt2-chat-ft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "faizack/gpt2-chat-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "faizack/gpt2-chat-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "faizack/gpt2-chat-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "faizack/gpt2-chat-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use faizack/gpt2-chat-ft with Docker Model Runner:
docker model run hf.co/faizack/gpt2-chat-ft
Model Card for GPT2-Chat (Fine-tuned)
This is a fine-tuned version of GPT-2 adapted for chat-style generation.
It was trained on conversational data to make GPT-2 behave more like ChatGPT, giving more interactive, coherent, and context-aware responses.
Model Details
Model Description
- Developed by: Faijan Khan
- Shared by: faizack
- Model type: Causal Language Model (decoder-only transformer)
- Language(s): English
- License: MIT (or same as GPT-2)
- Finetuned from: gpt2
Model Sources
- Repository: https://huggingface.co/faizack/gpt2-chat-ft
- Paper [GPT-2 original]: Language Models are Unsupervised Multitask Learners
Uses
Direct Use
- Conversational AI experiments
- Chatbot prototyping
- Educational or research purposes
Downstream Use
- Further fine-tuning for domain-specific dialogue (e.g., customer support, tutoring, storytelling).
Out-of-Scope Use
- Not intended for production use without additional safety layers.
- Not suitable for sensitive domains like medical, legal, or financial advice.
Bias, Risks, and Limitations
- May generate biased, offensive, or factually incorrect responses (inherited from GPT-2).
- Not aligned with RLHF like ChatGPT, so safety guardrails are minimal.
Recommendations
- Use with human oversight.
- Add filtering, moderation, or reinforcement learning with human feedback (RLHF) if deploying in production.
How to Get Started with the Model
from transformers import pipeline
chatbot = pipeline("text-generation", model="faizack/gpt2-chat-ft")
prompt = "Hello, how are you?"
response = chatbot(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)
print(response[0]["generated_text"])
Training Details
Training Data
- Fine-tuned on conversational datasets (prompt → response pairs).
Training Procedure
- Base model:
gpt2 - Objective: Causal LM (next token prediction).
- Mixed precision: fp16 training.
- Optimizer: AdamW.
Training Hyperparameters
- Learning rate: 5e-5
- Batch size: 4
- Epochs: 3
- Warmup steps: 500
Evaluation
Metrics
- Perplexity (PPL) for fluency.
- Manual qualitative evaluation for coherence.
Results
- Lower perplexity on conversational prompts compared to base GPT-2.
- Produces more context-aware and fluent chat responses.
Environmental Impact
- Hardware Type: NVIDIA A100 (40GB)
- Training time: ~2 hours
- Cloud Provider: Vast.ai (example)
- Carbon Emitted: Estimated <10 kg CO2eq
Technical Specifications
Model Architecture
- Transformer decoder-only (117M parameters).
- Context length: 1024 tokens.
Compute Infrastructure
- Hardware: 1x NVIDIA A100
- Software: PyTorch, Hugging Face Transformers, Accelerate.
Citation
If you use this model, please cite GPT-2 and this fine-tuned version:
BibTeX:
@misc{faizack2025gpt2chat,
author = {Faijan Khan},
title = {GPT2-Chat Fine-tuned Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/faizack/gpt2-chat-ft}}
}
- Downloads last month
- 5