Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published Jun 18, 2024 • 9
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization Paper • 2505.02819 • Published Feb 19 • 26
Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning Paper • 2508.04581 • Published Aug 6, 2025 • 6
view article Article Sparse Mixture of Experts Language Model from Scratch: Extending makeMoE with Expert Capacity AviSoori1x • Mar 18, 2024 • 14
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities Paper • 2410.07722 • Published Oct 10, 2024 • 15
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers tomaarsen, arthurbresnu • Jul 1, 2025 • 138
view article Article Train 400x faster Static Embedding Models with Sentence Transformers tomaarsen • Jan 15, 2025 • 229
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published May 30, 2025 • 74
view article Article Training and Finetuning Reranker Models with Sentence Transformers tomaarsen • Mar 26, 2025 • 193
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval Paper • 2505.16967 • Published May 22, 2025 • 24
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model Paper • 1902.04094 • Published Feb 11, 2019 • 1
view article Article Unlocking Longer Generation with Key-Value Cache Quantization RaushanTurganbay • May 16, 2024 • 56
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38