OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization Paper • 2605.21226 • Published 6 days ago • 9
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 12 days ago • 143
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 19 days ago • 229
LLM-OS-Models/gemma-4-E4B-it-Terminal-SFT-Native-Liquid-1Epoch Text Generation • 8B • Updated 12 days ago • 4.1k • 3
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 23 days ago • 163
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models Paper • 2604.04155 • Published Apr 5 • 12
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
openai/whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 7.91M • • 3.03k
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342