Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2605.23902

Image Generation

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Paper • 2506.07977 • Published Jun 9, 2025 • 40
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9, 2025 • 19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6, 2025 • 26
Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5, 2025 • 27

WorldKV: Efficient World Memory with World Retrieval and Compression

Paper • 2605.22718 • Published 10 days ago • 41
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Paper • 2605.25604 • Published 6 days ago • 132
Macaron-A2UI: A Model for Generative UI in Personal Agents

Paper • 2605.24830 • Published 7 days ago • 78
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published Jan 8 • 59

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 156
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

about 24 hours ago

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Paper • 2605.23902 • Published 9 days ago • 43
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Paper • 2605.30263 • Published 3 days ago • 49
From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published 4 days ago • 68
open-thoughts/AgentTrove

Viewer • Updated 24 days ago • 1.7M • 12k • 174

about 6 hours ago

Code as Agent Harness

Paper • 2605.18747 • Published 13 days ago • 210
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 19 days ago • 191
From Context to Skills: Can Language Models Learn from Context Skillfully?

Paper • 2604.27660 • Published 28 days ago • 166
PhysBrain 1.0 Technical Report

Paper • 2605.15298 • Published 17 days ago • 143

Image Generation

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Paper • 2506.07977 • Published Jun 9, 2025 • 40
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9, 2025 • 19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6, 2025 • 26
Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5, 2025 • 27

about 24 hours ago

PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Paper • 2605.23902 • Published 9 days ago • 43
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Paper • 2605.30263 • Published 3 days ago • 49
From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published 4 days ago • 68
open-thoughts/AgentTrove

Viewer • Updated 24 days ago • 1.7M • 12k • 174

WorldKV: Efficient World Memory with World Retrieval and Compression

Paper • 2605.22718 • Published 10 days ago • 41
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Paper • 2605.25604 • Published 6 days ago • 132
Macaron-A2UI: A Model for Generative UI in Personal Agents

Paper • 2605.24830 • Published 7 days ago • 78
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Paper • 2601.04720 • Published Jan 8 • 59

about 6 hours ago

Code as Agent Harness

Paper • 2605.18747 • Published 13 days ago • 210
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 19 days ago • 191
From Context to Skills: Can Language Models Learn from Context Skillfully?

Paper • 2604.27660 • Published 28 days ago • 166
PhysBrain 1.0 Technical Report

Paper • 2605.15298 • Published 17 days ago • 143

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 156
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs