-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2605.23902
-
WorldKV: Efficient World Memory with World Retrieval and Compression
Paper • 2605.22718 • Published • 41 -
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning
Paper • 2605.25604 • Published • 132 -
Macaron-A2UI: A Model for Generative UI in Personal Agents
Paper • 2605.24830 • Published • 78 -
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Paper • 2601.04720 • Published • 59
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 156 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
Paper • 2605.23902 • Published • 43 -
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
Paper • 2605.30263 • Published • 49 -
From Pixels to Words -- Towards Native One-Vision Models at Scale
Paper • 2605.28820 • Published • 68 -
open-thoughts/AgentTrove
Viewer • Updated • 1.7M • 12k • 174
-
Code as Agent Harness
Paper • 2605.18747 • Published • 210 -
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Paper • 2605.12500 • Published • 191 -
From Context to Skills: Can Language Models Learn from Context Skillfully?
Paper • 2604.27660 • Published • 166 -
PhysBrain 1.0 Technical Report
Paper • 2605.15298 • Published • 143
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 40 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
Paper • 2605.23902 • Published • 43 -
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
Paper • 2605.30263 • Published • 49 -
From Pixels to Words -- Towards Native One-Vision Models at Scale
Paper • 2605.28820 • Published • 68 -
open-thoughts/AgentTrove
Viewer • Updated • 1.7M • 12k • 174
-
WorldKV: Efficient World Memory with World Retrieval and Compression
Paper • 2605.22718 • Published • 41 -
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning
Paper • 2605.25604 • Published • 132 -
Macaron-A2UI: A Model for Generative UI in Personal Agents
Paper • 2605.24830 • Published • 78 -
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Paper • 2601.04720 • Published • 59
-
Code as Agent Harness
Paper • 2605.18747 • Published • 210 -
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Paper • 2605.12500 • Published • 191 -
From Context to Skills: Can Language Models Learn from Context Skillfully?
Paper • 2604.27660 • Published • 166 -
PhysBrain 1.0 Technical Report
Paper • 2605.15298 • Published • 143
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 156 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9