tinystories-gpt-small

This is a custom GPT model pre-trained from scratch on the TinyStories dataset. It demonstrates foundational language modeling capabilities and can be used for text generation.

Model Details

  • Architecture: Custom GPT
    • n_layer: 8
    • n_head: 8
    • n_embd: 512
    • block_size: 1024
    • vocab_size: 50257
    • dropout: 0.1
  • Pre-training Dataset: TinyStories (a synthetic dataset of short, simple stories designed to teach language models basic reasoning and coherence).
  • Purpose: This model is a base language model. It has learned to predict the next token in a sequence based on the patterns found in the TinyStories dataset. It is suitable for demonstrating basic generative text capabilities and serves as a foundation for further fine-tuning on specific downstream tasks (e.g., question answering, chatbot).

How to Use (Inference)

import torch
import tiktoken
from model import GPT, GPTConfig # Assuming model.py is available or its classes are defined

# 1. Define model configuration (must match the trained model's config.json)
# You can load this from config.json if you save it, or define it manually
config = GPTConfig(
    vocab_size=50257,
    block_size=1024,
    n_layer=8,
    n_head=8,
    n_embd=512,
    dropout=0.1,
    bias=True
)

# 2. Initialize the model and load weights
model = GPT(config)
state_dict = torch.load("pytorch_model.bin", map_location='cpu') # Replace with path to downloaded model
model.load_state_dict(state_dict)
model.eval() # Set to evaluation mode
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# 3. Load the tiktoken tokenizer
tokenizer = tiktoken.get_encoding("gpt2")
EOT_TOKEN_ID = tokenizer.eot_token 

# 4. Prepare your prompt for text generation
prompt_text = "Once upon a time there was a pumpkin."

# Encode the prompt
allowed_special_tokens = 'all' 
input_ids = tokenizer.encode(prompt_text, allowed_special=allowed_special_tokens)
input_ids_tensor = torch.tensor([input_ids], dtype=torch.long).to(device)

# 5. Generate text
# Adjust max_new_tokens, temperature, top_k as needed
generated_output_ids = model.generate(
    idx=input_ids_tensor,
    max_new_tokens=100, # Max length for the generated text
    temperature=0.7,
    top_k=50
)

# Decode the generated text (excluding the prompt part)
generated_text_ids = generated_output_ids[0, len(input_ids):].tolist()
generated_text = tokenizer.decode(generated_text_ids)

# Clean up any leftover EOT tokens from generation
generated_text = generated_text.replace(tokenizer.decode([EOT_TOKEN_ID]), "").strip()

print(f"Generated Text: {generated_text}")

Limitations and Bias

  • This model is a relatively small GPT (50.95M parameters) and its generative capabilities are limited by its size and the simplicity of the TinyStories dataset.
  • It is a base language model and has not been instruction-tuned or fine-tuned for specific tasks like complex question answering or dialogue. Therefore, its responses may be incoherent or non-factual for out-of-distribution prompts.
  • Like all language models, it may generate biased or incorrect information based on its training data.

License

Apache 2.0

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support