Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 7 days ago • 119
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance Paper • 2605.06535 • Published 7 days ago • 2
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 8 days ago • 96
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 14 days ago • 212
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 289
Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation Paper • 2604.02289 • Published Apr 2 • 13
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published Mar 3 • 87