🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 135 items • Updated Dec 18, 2025 • 118
GIM: Improved Interpretability for Large Language Models Paper • 2505.17630 • Published May 23, 2025 • 1
Running 100 The Eiffel Tower Llama 📝 100 Explore the Eiffel Tower Llama experiment with open-source models
Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated Dec 16, 2025 • 21
👤 Implicit Personalization in Language Models Collection Works on detecting, attributing and controlling implicit personalization in language models • 20 items • Updated Dec 14, 2025 • 1