Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper β’ 2602.06028 β’ Published Feb 5 β’ 36
Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation Paper β’ 2602.02214 β’ Published Feb 2 β’ 24
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers Paper β’ 2512.04504 β’ Published Dec 4, 2025 β’ 18
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers Paper β’ 2511.20123 β’ Published Nov 25, 2025 β’ 18
VQ-VA World: Towards High-Quality Visual Question-Visual Answering Paper β’ 2511.20573 β’ Published Nov 25, 2025 β’ 7
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper β’ 2510.22946 β’ Published Oct 27, 2025 β’ 18
ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding Paper β’ 2506.01853 β’ Published Jun 2, 2025 β’ 32
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper β’ 2502.15894 β’ Published Feb 21, 2025 β’ 20