News Collection News stanford-oval/ccnews Viewer • Updated Aug 31, 2024 • 893M • 15.9k • 36 danidanou/Bloomberg_Financial_News Viewer • Updated Jun 18, 2024 • 447k • 130 • 14 danidanou/Reuters_Financial_News Viewer • Updated Jun 18, 2024 • 105k • 71 • 7
Gaeilge Collection Dataset ReliableAI/Irish-Text-Collection Viewer • Updated Jun 13, 2024 • 2.26M • 14 • 3 ReliableAI/irish_aya_evaluation_suite Viewer • Updated Oct 17, 2024 • 400 • 4 ReliableAI/Irish-English-Parallel-Collection Viewer • Updated Jul 16, 2024 • 65.5k • 17 • 1 ReliableAI/irish_fineweb_edu Viewer • Updated Dec 16, 2024 • 488k • 774 • 1
Galahad Galahad TerenceLau/galahad-classifier-300m Text Classification • 0.3B • Updated Nov 28, 2025 • 4 TerenceLau/galahad-classifier-0.6B Text Classification • 0.6B • Updated Nov 29, 2025 • 3 • 1 TerenceLau/Galahad-collection Viewer • Updated Nov 27, 2025 • 18.8M • 4
LLM Dataset Collection Data HuggingFaceH4/self_instruct Viewer • Updated Mar 27, 2023 • 82.6k • 53 • 10 yizhongw/self_instruct Viewer • Updated Mar 7, 2023 • 197k • 416 • 193 openbmb/RLAIF-V-Dataset Preview • Updated Oct 14, 2025 • 1.41k • 215 Muennighoff/natural-instructions Viewer • Updated Dec 23, 2022 • 7.15M • 6.15k • 84
Multilingual VLM foundation model for multilingual VLM M-CLIP/XLM-Roberta-Large-Vit-L-14 Updated Sep 15, 2022 • 15.6k • 15 M-CLIP/LABSE-Vit-L-14 Updated Jun 2, 2022 • 91 • 3 visheratin/nllb-clip-large Updated Oct 11, 2023 • 5 • 1 visheratin/nllb-clip-large-siglip Zero-Shot Image Classification • Updated Mar 2, 2025 • 1.36k • 9
Reward and Safeguard nvidia/HelpSteer Viewer • Updated Dec 18, 2024 • 37.1k • 1.61k • 250 nvidia/HelpSteer2 Viewer • Updated Dec 18, 2024 • 21.4k • 5.05k • 452 IAAR-Shanghai/VAR Updated Nov 14, 2025 • 30 • 1 IAAR-Shanghai/KAF-Dataset Viewer • Updated Mar 26, 2025 • 49.9k • 62 • 6
Document Layout Dataset OleehyO/latex-formulas-80M Viewer • Updated Aug 22, 2025 • 78.2M • 29.6k • 25 lamm-mit/OleehyO-latex-formulas Viewer • Updated Jun 1, 2024 • 552k • 59 • 1 liminghao1630/DocBank Updated Aug 25, 2025 • 671 • 19 ibm-granite/granite-docling-258M Image-Text-to-Text • 0.3B • Updated Sep 23, 2025 • 105k • 1.2k
News Collection News stanford-oval/ccnews Viewer • Updated Aug 31, 2024 • 893M • 15.9k • 36 danidanou/Bloomberg_Financial_News Viewer • Updated Jun 18, 2024 • 447k • 130 • 14 danidanou/Reuters_Financial_News Viewer • Updated Jun 18, 2024 • 105k • 71 • 7
Multilingual VLM foundation model for multilingual VLM M-CLIP/XLM-Roberta-Large-Vit-L-14 Updated Sep 15, 2022 • 15.6k • 15 M-CLIP/LABSE-Vit-L-14 Updated Jun 2, 2022 • 91 • 3 visheratin/nllb-clip-large Updated Oct 11, 2023 • 5 • 1 visheratin/nllb-clip-large-siglip Zero-Shot Image Classification • Updated Mar 2, 2025 • 1.36k • 9
Gaeilge Collection Dataset ReliableAI/Irish-Text-Collection Viewer • Updated Jun 13, 2024 • 2.26M • 14 • 3 ReliableAI/irish_aya_evaluation_suite Viewer • Updated Oct 17, 2024 • 400 • 4 ReliableAI/Irish-English-Parallel-Collection Viewer • Updated Jul 16, 2024 • 65.5k • 17 • 1 ReliableAI/irish_fineweb_edu Viewer • Updated Dec 16, 2024 • 488k • 774 • 1
Reward and Safeguard nvidia/HelpSteer Viewer • Updated Dec 18, 2024 • 37.1k • 1.61k • 250 nvidia/HelpSteer2 Viewer • Updated Dec 18, 2024 • 21.4k • 5.05k • 452 IAAR-Shanghai/VAR Updated Nov 14, 2025 • 30 • 1 IAAR-Shanghai/KAF-Dataset Viewer • Updated Mar 26, 2025 • 49.9k • 62 • 6
Galahad Galahad TerenceLau/galahad-classifier-300m Text Classification • 0.3B • Updated Nov 28, 2025 • 4 TerenceLau/galahad-classifier-0.6B Text Classification • 0.6B • Updated Nov 29, 2025 • 3 • 1 TerenceLau/Galahad-collection Viewer • Updated Nov 27, 2025 • 18.8M • 4
Document Layout Dataset OleehyO/latex-formulas-80M Viewer • Updated Aug 22, 2025 • 78.2M • 29.6k • 25 lamm-mit/OleehyO-latex-formulas Viewer • Updated Jun 1, 2024 • 552k • 59 • 1 liminghao1630/DocBank Updated Aug 25, 2025 • 671 • 19 ibm-granite/granite-docling-258M Image-Text-to-Text • 0.3B • Updated Sep 23, 2025 • 105k • 1.2k
LLM Dataset Collection Data HuggingFaceH4/self_instruct Viewer • Updated Mar 27, 2023 • 82.6k • 53 • 10 yizhongw/self_instruct Viewer • Updated Mar 7, 2023 • 197k • 416 • 193 openbmb/RLAIF-V-Dataset Preview • Updated Oct 14, 2025 • 1.41k • 215 Muennighoff/natural-instructions Viewer • Updated Dec 23, 2022 • 7.15M • 6.15k • 84