Can Unconditional Language Models Recover Arbitrary Sentences? Paper • 1907.04944 • Published Jul 10, 2019
Discovering Useful Sentence Representations from Large Pretrained Language Models Paper • 2008.09049 • Published Aug 20, 2020
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search Paper • 2203.08436 • Published Mar 16, 2022
Extracting Latent Steering Vectors from Pretrained Language Models Paper • 2205.05124 • Published May 10, 2022
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code Paper • 2206.11249 • Published Jun 22, 2022
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Paper • 2102.01672 • Published Feb 2, 2021
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 35
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets Paper • 2103.12028 • Published Mar 22, 2021 • 3
A Survey of Deep Learning Approaches for OCR and Document Understanding Paper • 2011.13534 • Published Nov 27, 2020
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 65
Data Governance in the Age of Large-Scale Data-Driven Language Technology Paper • 2206.03216 • Published May 4, 2022