Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
fancyzhx 's Collections
Audio Datasets
Robotic Datasets
Video Datasets
Image Datasets
Text Datasets

Text Datasets

updated Jun 20, 2025
Upvote
-

  • Running
    131

    TxT360: Trillion Extracted Text

    📖
    131

    Explore and analyze the TxT360 dataset for LLM pre-training


  • CASIA-LM/ChineseWebText2.0

    Viewer • Updated Dec 2, 2024 • 2k • 2.01k • 27

  • HPLT/HPLT2.0_cleaned

    Viewer • Updated Nov 13, 2025 • 9.03B • 32.9k • 36

  • TrevorDohm/Pile_Tokenized

    Viewer • Updated Feb 20, 2024 • 134M • 1.21k
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs