UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
Abstract
UMO, a Unified Multi-identity Optimization framework, enhances identity consistency and reduces confusion in multi-reference image customization using reinforcement learning on diffusion models.
Recent advancements in image customization exhibit a wide range of application prospects due to stronger customization capabilities. However, since we humans are more sensitive to faces, a significant challenge remains in preserving consistent identity while avoiding identity confusion with multi-reference images, limiting the identity scalability of customization models. To address this, we present UMO, a Unified Multi-identity Optimization framework, designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability. With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem and unleashes multi-identity consistency for existing image customization methods generally through reinforcement learning on diffusion models. To facilitate the training of UMO, we develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts. Additionally, we propose a new metric to measure identity confusion. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a new state-of-the-art among open-source methods along the dimension of identity preserving. Code and model: https://github.com/bytedance/UMO
Community
We announce UMO, a unified multi-identity optimization framework and the latest addition to the UXO family. UMO can freely combine one-to-many identity with any subjects in any scenarios, delivering outputs with high subject/identity consistency. In line with our past practice, we will open-source the full project, including inference scripts, model weights, and training code, to advance research and empower the open-source community.
๐ code link: https://github.com/bytedance/UMO
๐ project page: https://bytedance.github.io/UMO/
๐ huggingface space1: https://huggingface.co/spaces/bytedance-research/UMO_UNO
๐ huggingface space2: https://huggingface.co/spaces/bytedance-research/UMO_OmniGen2
๐ model checkpoint: https://huggingface.co/bytedance-research/UMO
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PositionIC: Unified Position and Identity Consistency for Image Customization (2025)
- MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement (2025)
- Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration (2025)
- FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus (2025)
- StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation (2025)
- Robust ID-Specific Face Restoration via Alignment Learning (2025)
- Gen-AFFECT: Generation of Avatar Fine-grained Facial Expressions with Consistent identiTy (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
