SLM 500M Common Corpus English

This is a custom GPT-style PyTorch language model.

Dataset: PleIAs/common_corpus
Filter: language == English
Tokenizer: tiktoken GPT-2 encoding
Parameters: approximately 505.2M

Files:

  • config.json
  • pytorch_model.bin
  • training_state.pt

This is not a Transformers AutoModel checkpoint. Load it with your custom GPT and GPTConfig classes from the notebook.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support