# UMO **Repository Path**: ByteDance/UMO ## Basic Information - **Project Name**: UMO - **Description**: 🔥🔥 Official Repo of UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-09 - **Last Updated**: 2025-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

UMO: Scaling Multi-Identity Consistency for Image Customization
via Matching Reward

Yufeng Cheng, Wenxu Wu, Shaojin Wu, Mengqi Huang, Fei Ding, Qian He
>UXO Team
>Intelligent Creation Lab, Bytedance

## 🔥 News - 2025.09.15 🔥 The [ComfyUI workflows](comfyui/) of UMO are released. We provide several workflow examples with [UMO-UNO](comfyui/UNO) and [UMO-OmniGen2](comfyui/OmniGen2).

- 2025.09.09 🔥 The demos of UMO are released: [UMO-UNO](https://huggingface.co/spaces/bytedance-research/UMO_UNO) & [UMO-OmniGen2](https://huggingface.co/spaces/bytedance-research/UMO_OmniGen2) - 2025.09.09 🔥 The [paper](https://arxiv.org/abs/2509.06818) of UMO is released. - 2025.09.08 🔥 The [models](https://huggingface.co/bytedance-research/UMO) of UMO based on UNO and OmniGen2 are released. The released version of UMO are more stable than that reported in our paper. - 2025.09.08 🔥 The [project page](https://bytedance.github.io/UMO/) of UMO is created. - 2025.09.08 🔥 The inference and evaluation [code](https://github.com/bytedance/UMO) of UMO is released. ## 📖 Introduction Recent advancements in image customization exhibit a wide range of application prospects due to stronger customization capabilities. However, since we humans are more sensitive to faces, a significant challenge remains in preserving consistent identity while avoiding identity confusion with multi-reference images, limiting the identity scalability of customization models. To address this, we present *UMO*, a **U**nified **M**ulti-identity **O**ptimization framework, designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability. With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem and unleashes multi-identity consistency for existing image customization methods generally through reinforcement learning on diffusion models. To facilitate the training of UMO, we develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts. Additionally, we propose a new metric to measure identity confusion. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a new state-of-the-art among open-source methods along the dimension of identity preserving.

## ⚡️ Quick Start ### 🔧 Requirements and Installation ```bash # 1. Clone the repo with submodules: UNO & OmniGen2 git clone --recurse-submodules git@github.com:bytedance/UMO.git cd UMO ``` #### UMO requirements based on UNO ```bash # 2.1 (Optional, but recommended) Create a clean virtual Python 3.11 environment python3 -m venv venv/UMO_UNO source venv/UMO_UNO/bin/activate # 3.1 Install submodules UNO requirements as: # https://github.com/bytedance/UNO?tab=readme-ov-file#-requirements-and-installation # 4.1 Install UMO requirements pip install -r requirements.txt ``` #### UMO requirements based on OmniGen2 ```bash # 2.2 (Optional, but recommended) Create a clean virtual Python 3.11 environment python3 -m venv venv/UMO_OmniGen2 source venv/UMO_OmniGen2/bin/activate # 3.2 Install submodules OmniGen2 requirements as: # https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file#%EF%B8%8F-environment-setup # 4.2 Install UMO requirements pip install -r requirements.txt ``` #### UMO checkpoints download ```bash # pip install huggingface_hub hf-transfer export HF_HUB_ENABLE_HF_TRANSFER=1 # use hf_transfer to speedup # export HF_ENDPOINT=https://hf-mirror.com # use mirror to speedup if necessary repo_name="bytedance-research/UMO" local_dir="models/"$repo_name huggingface-cli download --resume-download $repo_name --local-dir $local_dir ``` ### 🌟 Gradio Demo ```bash # UMO (based on UNO) python3 demo/UNO/app.py --lora_path models/bytedance-research/UMO/UMO_UNO.safetensors # UMO (based on OmniGen2) python3 demo/OmniGen2/app.py --lora_path models/bytedance-research/UMO/UMO_OmniGen2.safetensors ``` ### ⚙️ ComfyUI Workflow #### UMO (based on UNO) Since ComfyUI supports [USO](https://docs.comfy.org/tutorials/flux/flux-1-uso), we get workflow of UMO (based on UNO) with removing nodes related to SigLIP style feature, and extend it to multi-reference. We provide several [example images](comfyui/UNO). You can download the image and drag it into ComfyUI to load the workflow. **Example with Single Identity** [Reference Image](assets/examples/OmniGen2/6/ref_2.webp)

**Example with Multi-Identity** [Reference Image 1](assets/examples/OmniGen2/6/ref_2.webp), [Reference Image 2](https://github.com/VectorSpaceLab/OmniGen2/blob/main/example_images/000050281.jpg)

#### UMO (based on OmniGen2) Since ComfyUI supports [OmniGen2](https://docs.comfy.org/tutorials/image/omnigen/omnigen2), we just add a node to load our UMO lora. Firstly, you should convert our UMO lora checkpoint to ComfyUI format as below: ```bash python3 comfyui/OmniGen2/convert_ckpt.py ``` Then, you can download the [example images](comfyui/OmniGen2) and drag them into ComfyUI to load the workflow. **Example with Single Identity** [Reference Image](assets/examples/OmniGen2/5/ref_2.png)

**Example with Multi-Identity** [Reference Image 1](assets/examples/OmniGen2/6/ref_2.webp), [Reference Image 2](assets/examples/OmniGen2/6/ref_1.webp)

### ✍️ Inference #### UMO (based on UNO) inference on XVerseBench ```bash # single subject accelerate launch eval/UNO/inference_xversebench.py \ --eval_json_path projects/XVerse/eval/tools/XVerseBench_single.json \ --num_images_per_prompt 4 \ --width 768 \ --height 768 \ --save_path output/XVerseBench/single/UMO_UNO \ --lora_path models/bytedance-research/UMO/UMO_UNO.safetensors # multi subject accelerate launch eval/UNO/inference_xversebench.py \ --eval_json_path projects/XVerse/eval/tools/XVerseBench_multi.json \ --num_images_per_prompt 4 \ --width 768 \ --height 768 \ --save_path output/XVerseBench/multi/UMO_UNO \ --lora_path models/bytedance-research/UMO/UMO_UNO.safetensors ``` #### UMO (based on UNO) inference on OmniContext ```bash accelerate launch eval/UNO/inference_omnicontext.py \ --eval_json_path OmniGen2/OmniContext \ --width 768 \ --height 768 \ --save_path output/OmniContext/UMO_UNO \ --lora_path models/bytedance-research/UMO/UMO_UNO.safetensors ``` #### UMO (based on OmniGen2) inference on XVerseBench ```bash # single subject accelerate launch -m eval.OmniGen2.inference_xversebench \ --model_path OmniGen2/OmniGen2 \ --model_name UMO_OmniGen2 \ --test_data projects/XVerse/eval/tools/XVerseBench_single.json \ --result_dir output/XVerseBench/single \ --num_images_per_prompt 4 \ --disable_align_res \ --lora_path models/bytedance-research/UMO/UMO_OmniGen2.safetensors # multi subject accelerate launch -m eval.OmniGen2.inference_xversebench \ --model_path OmniGen2/OmniGen2 \ --model_name UMO_OmniGen2 \ --test_data projects/XVerse/eval/tools/XVerseBench_multi.json \ --result_dir output/XVerseBench/multi \ --num_images_per_prompt 4 \ --disable_align_res \ --lora_path models/bytedance-research/UMO/UMO_OmniGen2.safetensors ``` #### UMO (based on OmniGen2) inference on OmniContext ```bash accelerate launch -m eval.OmniGen2.inference_omnicontext \ --model_path OmniGen2/OmniGen2 \ --model_name UMO_OmniGen2 \ --test_data OmniGen2/OmniContext \ --result_dir output/OmniContext \ --num_images_per_prompt 1 \ --disable_align_res \ --lora_path models/bytedance-research/UMO/UMO_OmniGen2.safetensors ``` ### 🔍 Evaluation #### Evaluation on XVerseBench To make evaluation on XVerseBench, please get the dependencies and models as [XVerse](https://github.com/bytedance/XVerse?tab=readme-ov-file#requirements-and-installation) first. Then run the script: ```bash # UMO (based on UNO) single subject bash scripts/eval_xversebench.sh single output/XVerseBench/single/UMO_UNO # UMO (based on UNO) multi subject bash scripts/eval_xversebench.sh multi output/XVerseBench/multi/UMO_UNO # UMO (based on OmniGen2) single subject bash scripts/eval_xversebench.sh single output/XVerseBench/single/UMO_OmniGen2 # UMO (based on OmniGen2) multi subject bash scripts/eval_xversebench.sh multi output/XVerseBench/multi/UMO_OmniGen2 ``` #### Evaluation on OmniContext For original metrics (*i.e.*, PF, SC, Overall) in OmniContext, just follow [OmniContext](https://github.com/VectorSpaceLab/OmniGen2/tree/main/omnicontext#step3-evaluation). For ID-Sim and ID-Conf metric, please run the script: ```bash # UMO (based on UNO) bash scripts/eval_id_omnicontext.sh UMO_UNO # UMO (based on OmniGen2) bash scripts/eval_id_omnicontext.sh UMO_OmniGen2 ``` ### 📌 Tips and Notes Please note that UNO gets unstable results on parts of OmniContext due to the different prompt format with its training data ([UNO-1M](https://huggingface.co/datasets/bytedance-research/UNO-1M)), leading to similar issue with UMO based on it. To get better results with these two models, we recommend using description prompt instead of instruction one, using resolution 768~1024 instead of 512. ## 📄 Disclaimer

We open-source this project for academic research. The vast majority of images used in this project are either generated or licensed. If you have any concerns, please contact us, and we will promptly remove any inappropriate content. Our code is released under the Apache 2.0 License.

This research aims to advance the field of generative AI. Users are free to create images using this tool, provided they comply with local laws and exercise responsible usage. The developers are not liable for any misuse of the tool by users.

## 🚀 Updates For the purpose of fostering research and the open-source community, we plan to open-source the entire project, encompassing training, inference, weights, etc. Thank you for your patience and support! 🌟 - [x] Release project page - [x] Release model on huggingface - [x] Release huggingface demo - [ ] Release training code ## Citation If UMO is helpful, please help to ⭐ the repo. If you find this project useful for your research, please consider citing our paper: ```bibtex @article{cheng2025umo, title={UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward}, author={Cheng, Yufeng and Wu, Wenxu and Wu, Shaojin and Huang, Mengqi and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2509.06818}, year={2025} } ```