Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 72
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published 16 days ago • 37
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Paper • 2510.13795 • Published Oct 15 • 56
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era Jan 15 • 48
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published Oct 27 • 20
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 116
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16 • 65
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models Paper • 2510.12784 • Published Oct 14 • 19