# UnifiedReward **Repository Path**: xdjiangkai/UnifiedReward ## Basic Information - **Project Name**: UnifiedReward - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-02 - **Last Updated**: 2025-12-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

UnifiedReward Team Works

Benchmarks

> [**UniREditBench: A Unified Reasoning-based Image Editing Benchmark**](https://maplebb.github.io/UniREditBench/): We propose **UniREditBench**, a unified reasoning-based image editing benchmark, and further construct **UniREdit-Data-100K**, a large-scale synthetic dataset with high-quality CoT annotations, and develop **UniREdit-Bagel** by fine-tuning Bagel on this dataset. > >

> >

> [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UniREdit_Data_100K-yellow)](https://huggingface.co/datasets/maplebb/UniREdit-Data-100K) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UniREdit_Bagel-yellow)](https://huggingface.co/maplebb/UniREdit-Bagel) > [**UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation**](https://codegoat24.github.io/UniGenBench): We propose **UniGenBench++**, a unified semantic benchmark for T2I generation. It supports both **short and long prompts in Chinese and English**, featuring a **streamlined evaluation pipeline** and a robust **offline evaluation model**. > >

> >

> [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Eval_Images-yellow)](https://huggingface.co/datasets/CodeGoat24/UniGenBench-Eval-Images) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Offline_Eval_Model-yellow)](https://huggingface.co/CodeGoat24/UniGenBench-EvalModel-qwen-72b-v1) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Leaderboard%20-English-brown)](https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Leaderboard%20-Chinese-red)](https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard_Chinese) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Leaderboard%20-English%20Long-orange)](https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard_English_Long) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Leaderboard%20-Chinese%20Long-pink)](https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard_Chinese_Long)

Models

> [**Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning**](https://codegoat24.github.io/UnifiedReward/Pref-GRPO): We propose **Pref-GRPO** and **UniGenbench**, the first **preference reward-based GRPO method** for stable T2I reinforcement learning, and a **unified T2I generation benchmark** for fine-grained semantic consistency evaluation. > >

> >

> [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Leaderboard-yellow)](https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard) > [**NeurIPS 2025**] [**Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning**](https://codegoat24.github.io/UnifiedReward/think): We propose **UnifiedReward-Think**, the first unified multimodal CoT reward model. > >

> [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_qwen3vl_8b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-8b) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_qwen_7b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen-7b) > [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_llava_7b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-7b) > [**Unified Reward Model for Multimodal Understanding and Generation**](https://codegoat24.github.io/UnifiedReward/): We release the **UnifiedReward**, **the first unified reward model for multimodal understanding and generation assessment**, enabling both pairwise ranking and pointwise scoring. ### ✨ **Awesome Works using UnifiedReward** 😊 Meta, [Transition Matching: Scalable and Flexible Generative Modeling](https://arxiv.org/pdf/2506.23589). 😊 NVIDIA, Stanford, Tsinghua, [DiffusionNFT: Online Diffusion Reinforcement with Forward Process](https://arxiv.org/pdf/2509.16117). [![[code]](https://img.shields.io/github/stars/NVlabs/DiffusionNFT)](https://github.com/NVlabs/DiffusionNFT) 😊 Apple, Fudan, [UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning](https://arxiv.org/pdf/2511.14760). 😊 University of California, USTC, PKU, BIGAI, [MILR: Improving Multimodal Image Generation via Test-time Latent Reasoning](https://arxiv.org/pdf/2509.22761). 😊 Kuaishou, Tsinghua, CUHK, [Flow-GRPO: Training Flow Matching Models via Online RL](https://github.com/yifan123/flow_grpo). [![[code]](https://img.shields.io/github/stars/yifan123/flow_grpo)](https://github.com/yifan123/flow_grpo) 😊 Tencent Hunyuan, [MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE](https://arxiv.org/pdf/2507.21802). [![[code]](https://img.shields.io/github/stars/Tencent-Hunyuan/MixGRPO)](https://github.com/Tencent-Hunyuan/MixGRPO) 😊 Kling Team, CUHK MMLab, NJU, [VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning](https://arxiv.org/pdf/2510.10518). [![[code]](https://img.shields.io/github/stars/qunzhongwang/vr-thinker)](https://github.com/qunzhongwang/vr-thinker) 😊 CUHK MMLab, [Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO](https://arxiv.org/pdf/2505.17017). [![[code]](https://img.shields.io/github/stars/ZiyuGuo99/Image-Generation-CoT)](https://github.com/ZiyuGuo99/Image-Generation-CoT) | Method | HPS | ImageReward | UnifiedReward | |------------|-----------|-----------|-----------| | Janus-Pro + DPO | 77.3 | 77.7 | **80.0** | | Janus-Pro + GRPO | 79.2 | 79.3 | **81.0** | | Janus-Pro + Best-of-4 | 82.1 | 82.4 | **84.5** | 😊 Tencent Hunyuan X, [X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again](https://arxiv.org/pdf/2507.22058). [![[code]](https://img.shields.io/github/stars/X-Omni-Team/X-Omni)](https://github.com/X-Omni-Team/X-Omni) ## 🔥 News [2025/11/17] 🔥🔥🔥 We release **UnifiedReward-Think-qwen3vl**-[[2b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-2b)/[4b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-4b)/[8b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-8b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-32b)]. The inference code is provided at [here](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3vl-inference). [2025/11/11] 🔥🔥🔥 We release **UnifiedReward-2.0-qwen3vl**-[[2b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen3vl-2b)/[4b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen3vl-4b)/[8b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen3vl-8b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen3vl-32b)] and **UnifiedReward-Edit-qwen3vl**-[[2b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen3vl-2b)/[4b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen3vl-4b)/[8b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen3vl-8b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen3vl-32b)]!!! [2025/10/23] 🔥 We release **UnifiedReward-Edit**-qwen-[[3b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen-3b)/[7b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen-7b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen-32b)/[72b](https://huggingface.co/CodeGoat24/UnifiedReward-Edit-qwen-72b)], a unified reward model for **both Text-to-Image and Image-to-Image generation** trained on approximately 700K unified image generation and editing reward data!! For image editing reward task, our models support: >1. Pairwise Rank — directly judge which of two edited images is better. > >2. Pairwise Score — assign a separate score to each image in a pair. > >3. Pointwise Score — rate a single image on two axes: instruction-following and overall image quality. 🚀 The image editing reward inference code is available at [`UnifiedReward-Edit/`](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Edit) directory, while T2I inference code is unchanged from previous models. The editing training data is preprocessed from [EditScore](https://huggingface.co/datasets/EditScore/EditScore-Reward-Data) and [EditReward](https://huggingface.co/datasets/TIGER-Lab/EditReward-Data) and will be released soon. We sincerely appreciate all contributors!! [2025/9/25] 🔥 We release **UnifiedReward-2.0**-qwen-[[3b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-3b)/[7b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-7b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-32b)/[72b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-72b)]. This version introduces several new capabilities: > >1. **Pairwise scoring** for image and video generation assessment on **_Alignment_**, **_Coherence_**, **_Style_** dimensions. > >2. **Pointwise scoring** for image and video generation assessment on **_Alignment_**, **_Coherence/Physics_**, **_Style_** dimensions. > The added inference code is available at [`inference_qwen/UnifiedReward-2.0-inference`](https://github.com/CodeGoat24/UnifiedReward/tree/main/inference_qwen/UnifiedReward-2.0-inference) directory. The newly added training data has been released [here](https://huggingface.co/datasets/CodeGoat24/UnifiedReward-2.0-T2X-score-data) 😊. 😊 We are actively gathering feedback from the community to improve our models. **We welcome your input and encourage you to stay updated through our repository**!!

Unified Reward Model for Multimodal Understanding and Generation

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%202.0%20Qwen3VL-Checkpoints-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-20-models-68b7c99ab70ff81184c70270) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%202.0%20Qwen2.5VL-Checkpoints-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-20-models-68b7c99ab70ff81184c70270) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%20Edit%20-Checkpoints-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-edit-models) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%20Qwen2.5VL-Checkpoints-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-10-models-67c3008148c3a380d15ac63a) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%20LLaVA-Checkpoints-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-10-models-67c3008148c3a380d15ac63a) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20UnifiedReward%20-Datasets-yellow)](https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede) 😊 We appreciate the [mradermacher](https://huggingface.co/mradermacher) team for providing the [GGUF](https://huggingface.co/collections/CodeGoat24/unifiedreward-models-gguf-683fe14b5e2b8422049f45ca) version of our models, and the [Tencent Hunyuan](https://hunyuan.tencent.com/) team for providing the evaluation results on several T2I models using [UnifiedReward-qwen-7b](https://huggingface.co/CodeGoat24/UnifiedReward-qwen-7b)!! The evaluation was conducted on 400 prompts sourced from [here](https://artificialanalysis.ai/text-to-image/arena?tab=arena).

click for evaluation results on several T2I models

| Model | Alignment | Coherence | Style | |---------------------|------------------|-----------------------|------------------| | Flux-pro-ultra | 3.6453 | 3.8193 | 3.4971 | | Imagen-4.0 | 3.6792 | 3.8049 | 3.4756 | | Recraft-v3 | 3.6611 | 3.8409 | **3.5158** | | OpenAI-GPT-image-1 | 3.6890 | **3.8448** | 3.4960 | | Imagen-3.0 | 3.6733 | 3.8027 | 3.4674 | | Seedream-3.0 | **3.6927** | 3.8218 | 3.4887 |

## 🔥🔥🔥 [NeurIPS 2025] **UnifiedReward-Think**

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_qwen3vl_8b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-8b)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_qwen_7b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen-7b)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-UnifiedReward_Think_llava_7b-yellow)](https://huggingface.co/CodeGoat24/UnifiedReward-Think-7b)

We release **UnifiedReward-Think** -- **the first unified multimodal CoT reward model**, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks. Please refer to the [README.md](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think) for training and inference details. [2025/11/17] 🔥🔥🔥 We release **UnifiedReward-Think-qwen3vl**-[[2b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-2b)/[4b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-4b)/[8b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-8b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-Think-qwen3vl-32b)]. The inference code is provided at [here](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3vl-inference). ## 🏁 Compared with Current Reward Models | Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding | CoT Reasoning | :-----: | :-----: |:-----: |:-----: | :-----: | :-----: | :-----: | | [PickScore](https://github.com/yuvalkirstain/PickScore) |Point | √ | | || | | [HPS](https://github.com/tgxs002/HPSv2) | Point | √ | ||| | | [ImageReward](https://github.com/THUDM/ImageReward) | Point| √| ||| | | [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) | Pair/Point | | √ ||| | | [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) | Pair/Point | | √ ||√| | | [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) | Point | | |√ || | | [LiFT](https://github.com/CodeGoat24/LiFT) | Point | | |√| | | | [VisionReward](https://github.com/THUDM/VisionReward) | Point |√ | |√|| | | [VideoReward](https://github.com/KwaiVGI/VideoAlign) | Point | | |√ || | | **UnifiedReward** (Ours) | Pair/Point | √ | √ |√|√| | | **UnifiedReward-Think** (Ours) | Pair/Point | √ | √ |√|√|√| ## 🔧 Environment Set Up 1. Clone this repository and navigate to the UnifiedReward folder: ```bash git clone https://github.com/CodeGoat24/UnifiedReward.git cd UnifiedReward ``` 2. Install the inference package: ```bash conda create -n unifiedreward python=3.10 -y conda activate unifiedreward pip install --upgrade pip pip install -e ".[train]" pip install flash_attn==2.5.8 --no-build-isolation ``` ## 🚀 Inference For Qwen2.5-VL based UnifiedReward models, you should first install the inference packages as follows: ```bash pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils[decord]==0.0.8 ``` We provide reference pair ranking and point score inference code for each task in the `./inference` and `./inference_qwen` directories. ```bash inference ├── image_generation ├── pair_rank_image_generation.py └── point_score_image_generation.py ├── video_understanding ├── pair_rank_video_understanding.py └── point_score_video_understanding.py ... ``` Note that our model is not constrained to a fixed input prompt style. You can flexibly adjust inputs based on your requirements. ### 1. vLLM Inference We provide vLLM inference code for UnifiedReward-qwen in `vllm_qwen` directory. 1. Install vLLM ```bash pip install vllm>=0.11.0 pip install qwen-vl-utils==0.0.14 ``` 2. Deploy vLLM Server ```bash bash vllm_qwen/vllm_server.sh ``` 3. Inference Request to vLLM Server ```bash python vllm_qwen/vllm_inference.py ``` ### 2. SGLang Inference We provide SGLang inference code for UnifiedReward-llava in `sglang_llava` directory. 1. Install SGLang ```bash pip install "sglang[all]" ``` 2. Deploy SGLang Server ```bash bash sglang_llava/sglang_server.sh ``` 3. Inference Request to SGLang Server ```bash python sglang_llava/sglang_inference.py ``` ## 💻 Training UnifiedReward ### 1. Training based on Qwen2.5-VL-Instruct (Recommended) We use [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) to train the SFT model. 1. Clone the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) repository and install the dependencies. ```bash git clone https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory pip install -e ".[torch,metrics]" ``` Follow this [README](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README.md) ([Multimodal Image Dataset](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/mllm_demo.json)) to prepare our released [datasets](https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede). 2. Run the following command to train the SFT model. ```bash llamafactory-cli train examples/train_full/qwen2_5vl_full_sft.yaml ``` ### 2. Training based on LLaVA-Onevision #### 2.1 Unified Preference Training Dataset Preparation Please download our constructed unified preference dataset from [Huggingface](https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede) and put it in `./dataset/`. ``` dataset ├── EvalMuse ├── pairwise └── pointwise └── ... └── HPD └── LiFT-HRA └── LLaVA-Critic ├── pairwise └── pointwise └── ... └── OIP └── ShareGPTVideo ├── pairwise └── pointwise └── ... └── VideoDPO └── VideoFeedback └── train_data.yaml ``` #### 2.2 Training based on LLaVA-Onevision ```bash bash train.sh ``` ## ✨ Direct Preference Optimization

🎨 Image and Video Understanding DPO

#### 1. Construct Preference data The data for preference data construction should adhere to the following structure: ```bash [ { "prompt": "", "image": "", }, ... ] ``` Then ```bash # image understanding cd preference_data_construction/image_understanding python infer+sift.py # you need to fill the 'image_folder' and 'data_path' in this file # video understanding cd preference_data_construction/video_understanding python infer+sift.py # you need to fill the 'image_folder' and 'data_path' in this file ``` #### 2. Training The training data format in `data.json` should adhere to the following structure: ```bash [ { "id": "", "image": "", "prompt": "", "chosen": "", "rejected": "" }, ... ] ``` Then start training: ```bash # image understanding bash dpo_image_understand_ov7b.sh # video understanding bash dpo_video_understand_llava_video_7b.sh ```

🖼️ Image Generation DPO

#### 0. Prepare Environments ```bash cd DiffusionDPO conda create -n diffdpo python=3.10 -y conda activate diffdpo pip install -r requirements.txt ``` #### 1. Construct Preference data Image Generation The data for preference data construction should adhere to the following structure: ```bash [ { "prompt": "", }, ... ] ``` Then ```bash python data_generation.py # you need to fill the 'data_path' in this file ``` Preference Pair Data Construction ```bash python sift_dpo_data.py ``` #### 2. Training The training data format in `data.json` should adhere to the following structure: ```bash [ { "id": "", "caption": "", "jpg_0": "", #chosen image path "jpg_1": "", #rejected image path "label_0": 1, }, ... ] ``` Then start training: ```bash bash launchers/turbo_dpo.sh ```

🎬 Video Generation DPO

#### 0. Prepare Environments ```bash cd VideoDPO conda create -n videodpo python=3.10 -y conda activate videodpo pip install -r requirements.txt ``` Run following instruction to download VideoCrafter checkpoints. ```bash mkdir -p checkpoints/vc2 wget -P checkpoints/vc2 https://huggingface.co/VideoCrafter/VideoCrafter2/resolve/main/model.ckpt ``` Please download our constructed T2V-Turbo model and its reference model from [Huggingface](https://huggingface.co/CodeGoat24/T2V-Turbo) and put it in `./checkpoints/t2v-turbo`. #### 1. Construct Preference data Video Generation The data for preference data construction should adhere to the following structure: ```bash [ { "prompt": "", }, ... ] ``` Then ```bash bash data_generation.sh # you need to fill '--prompts_file' in this file ``` Preference Pair Data Construction ```bash python sift_dpo_data.py ``` #### 2. Training The training data format in `data.json` should adhere to the following structure: ```bash [ { "id": "", "caption": "", "chosen": "", # chosen video path "rejected": "", # rejected video path }, ... ] ``` Then start training: ```bash bash run.sh ```

## 🚀 Evaluation We provide several evaluation code in `./benchmark_evaluation` directory. ### Reward model We provide evaluation code for [GenAI-Bench-Video](https://github.com/TIGER-AI-Lab/GenAI-Bench), [GenAI-Bench-Image](https://github.com/TIGER-AI-Lab/GenAI-Bench), [VideoGen-RewardBench](https://huggingface.co/datasets/KwaiVGI/VideoGen-RewardBench) and [VL-RewardBench](https://huggingface.co/datasets/MMInstruction/VL-RewardBench) benchmarks. ### Video Understanding We provide evaluation code for [MSRVTT](https://github.com/xudejing/video-question-answering), [MSVD](https://github.com/xudejing/video-question-answering), and [TGIF](https://github.com/YunseokJANG/tgif-qa) benchmarks while using the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) toolkit for evaluating LongVideoBench, MLVU, and Video-MME benchmarks with 64 input frames. ### Image Understanding We use [LMMs-Eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) toolkit to evaluate LLaVABench, WildVision, LLaVABench-Wilder, LiveBench, and MMHal benchmarks. ### Image Generation We utilize the image reward model, i.e., [PickScore](https://github.com/yuvalkirstain/PickScore), [HPS](https://github.com/tgxs002/HPSv2) and [ImageReward](https://github.com/THUDM/ImageReward) for quality assessment. ### Video Generation [VBench](https://github.com/Vchitect/VBench) is used for video generation assessment. ## 📧 Contact If you have any comments or questions, please open a new issue or feel free to contact [Yibin Wang](https://codegoat24.github.io). ## 🤗 Acknowledgments In this work, reward model and image/video understanding DPO code is based on [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT), while image and video generation DPO is based on [DiffusionDPO](https://github.com/SalesforceAIResearch/DiffusionDPO) and [VideoDPO](https://github.com/CIntellifusion/VideoDPO). We also utilize [LMMs-Eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) toolkits for evaluation. Thanks to all the contributors! ## ⭐ Citation ```bibtex @article{unifiedreward-think, title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning}, author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi}, journal={arXiv preprint arXiv:2505.03318}, year={2025} } ``` ```bibtex @article{unifiedreward, title={Unified reward model for multimodal understanding and generation}, author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi}, journal={arXiv preprint arXiv:2503.05236}, year={2025} } ```