Collections
Discover the best community collections!
Collections including paper arxiv:2505.19297
-
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper β’ 2505.19297 β’ Published β’ 84 -
yandex/alchemist
Viewer β’ Updated β’ 3.35k β’ 182 β’ 48 -
yandex/stable-diffusion-3.5-large-alchemist
Text-to-Image β’ Updated β’ 11 β’ 9 -
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image β’ Updated β’ 18 β’ 6
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper β’ 2504.01883 β’ Published β’ 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper β’ 2504.08837 β’ Published β’ 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper β’ 2504.10068 β’ Published β’ 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper β’ 2504.10481 β’ Published β’ 85
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper β’ 2412.14475 β’ Published β’ 55 -
How to Synthesize Text Data without Model Collapse?
Paper β’ 2412.14689 β’ Published β’ 52 -
Token-Budget-Aware LLM Reasoning
Paper β’ 2412.18547 β’ Published β’ 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper β’ 2412.17998 β’ Published β’ 11
-
The Leaderboard Illusion
Paper β’ 2504.20879 β’ Published β’ 72 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper β’ 2505.09343 β’ Published β’ 73 -
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper β’ 2504.19394 β’ Published β’ 14 -
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Paper β’ 2504.19056 β’ Published β’ 18
-
One-Minute Video Generation with Test-Time Training
Paper β’ 2504.05298 β’ Published β’ 110 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper β’ 2503.23307 β’ Published β’ 138 -
Towards Understanding Camera Motions in Any Video
Paper β’ 2504.15376 β’ Published β’ 157 -
Antidistillation Sampling
Paper β’ 2504.13146 β’ Published β’ 59
-
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Paper β’ 2412.20800 β’ Published β’ 11 -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper β’ 2501.06751 β’ Published β’ 32 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper β’ 2501.09732 β’ Published β’ 71 -
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper β’ 2501.09755 β’ Published β’ 36
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper β’ 2410.10306 β’ Published β’ 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper β’ 2411.05003 β’ Published β’ 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper β’ 2411.04709 β’ Published β’ 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper β’ 2410.07171 β’ Published β’ 43
-
The Leaderboard Illusion
Paper β’ 2504.20879 β’ Published β’ 72 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper β’ 2505.09343 β’ Published β’ 73 -
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper β’ 2504.19394 β’ Published β’ 14 -
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Paper β’ 2504.19056 β’ Published β’ 18
-
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper β’ 2505.19297 β’ Published β’ 84 -
yandex/alchemist
Viewer β’ Updated β’ 3.35k β’ 182 β’ 48 -
yandex/stable-diffusion-3.5-large-alchemist
Text-to-Image β’ Updated β’ 11 β’ 9 -
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image β’ Updated β’ 18 β’ 6
-
One-Minute Video Generation with Test-Time Training
Paper β’ 2504.05298 β’ Published β’ 110 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper β’ 2503.23307 β’ Published β’ 138 -
Towards Understanding Camera Motions in Any Video
Paper β’ 2504.15376 β’ Published β’ 157 -
Antidistillation Sampling
Paper β’ 2504.13146 β’ Published β’ 59
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper β’ 2504.01883 β’ Published β’ 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper β’ 2504.08837 β’ Published β’ 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper β’ 2504.10068 β’ Published β’ 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper β’ 2504.10481 β’ Published β’ 85
-
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Paper β’ 2412.20800 β’ Published β’ 11 -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper β’ 2501.06751 β’ Published β’ 32 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper β’ 2501.09732 β’ Published β’ 71 -
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper β’ 2501.09755 β’ Published β’ 36
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper β’ 2412.14475 β’ Published β’ 55 -
How to Synthesize Text Data without Model Collapse?
Paper β’ 2412.14689 β’ Published β’ 52 -
Token-Budget-Aware LLM Reasoning
Paper β’ 2412.18547 β’ Published β’ 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper β’ 2412.17998 β’ Published β’ 11
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper β’ 2410.10306 β’ Published β’ 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper β’ 2411.05003 β’ Published β’ 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper β’ 2411.04709 β’ Published β’ 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper β’ 2410.07171 β’ Published β’ 43