Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper • 2510.21614 • Published Oct 24 • 22
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published 20 days ago • 71
Multimodal Safety Evaluation in Generative Agent Social Simulations Paper • 2510.07709 • Published Oct 9 • 13
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation Paper • 2509.21989 • Published Sep 26 • 22
Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale Paper • 2509.14008 • Published Sep 17 • 88
Hala Collection A series of light-weight Arabic language models (instruction following + translation) and Arabic instruction dataset. • 8 items • Updated Sep 18 • 7
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1 • 58
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published Aug 28 • 14
Train Long, Think Short: Curriculum Learning for Efficient Reasoning Paper • 2508.08940 • Published Aug 12 • 27
MatchDiffusion: Training-free Generation of Match-cuts Paper • 2411.18677 • Published Nov 27, 2024 • 1
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published Jun 10 • 27
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs Paper • 2505.19800 • Published May 26 • 2
An Embarrassingly Simple Defense Against LLM Abliteration Attacks Paper • 2505.19056 • Published May 25 • 6
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Paper • 2504.20708 • Published Apr 29 • 23
Towards Data-Efficient Pretraining for Atomic Property Prediction Paper • 2502.11085 • Published Feb 16 • 3
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 31