# EvoQuality
**Repository Path**: ByteDance/EvoQuality
## Basic Information
- **Project Name**: EvoQuality
- **Description**: Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking (ICLR 2026)
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-06-12
- **Last Updated**: 2026-06-13
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ICLR 2026: EvoQuality
### Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
[](https://arxiv.org/pdf/2509.25787)
[](https://openreview.net/forum?id=INOi0YqI8p\&referrer=%5BAuthor%20Console%5D\(%2Fgroup%3Fid%3DICLR.cc%2F2026%2FConference%2FAuthors%23your-submissions\))
[](https://huggingface.co/ByteDance/EvoQuality)
***
## Abstract
**EvoQuality** is a self-supervised framework that enables a Vision-Language Model (VLM) to autonomously refine its image quality perception capability via pairwise majority voting and **GRPO**-based iterative evolution, without requiring any ground-truth labels.
Figure 1: Overview of the EvoQuality Framework
***
## Model
The pre-trained EvoQuality model weights are publicly available on Hugging Face:
- 🤗 **[ByteDance/EvoQuality](https://huggingface.co/ByteDance/EvoQuality)**
You can download the model directly via `git lfs` or the `huggingface_hub` Python API:
```bash
git lfs install
git clone https://huggingface.co/ByteDance/EvoQuality
```
```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="ByteDance/EvoQuality", local_dir="./EvoQuality")
```
***
## ️ Installation
### 1. Clone the repository
```bash
git clone https://github.com/your-organization/evoquality.git
cd evoquality/verl
```
### 2. Create a virtual environment
```bash
conda create -n evoquality python=3.9 -y
conda activate evoquality
```
### 3. Install dependencies
We provide a setup script:
```bash
bash setup.sh
```
***
## Quick Start
### 1. Prepare Data
Place your dataset in the `data/` directory. The expected structure is:
```
data/
└── YourDataset/
├── train.parquet
├── test.parquet
└── data.csv
```
### 2. Training
To train EvoQuality:
```bash
cd examples/ttrl/Qwen-Pair/
# Set environment variables or modify the script directly
export DATA_LOCAL_DIR="./data"
export BACKBONE_PATH="./model/Qwen2.5-VL-7B-Instruct"
export WANDB_PROJECT="EvoQuality"
bash train.sh
```
### 3. Merge Checkpoint
After training, you need to merge the FSDP checkpoint into a Hugging Face format:
```bash
cd ../../.. # Go back to verl/ directory
python3 -m verl.model_merger merge \
--backend fsdp \
--local_dir /path/to/your/checkpoint/global_step_XXX/actor \
--target_dir /path/to/merged/hf/model
```
Example:
```bash
python3 -m verl.model_merger merge \
--backend fsdp \
--local_dir ./ckpts/EvoQuality/0318/Experiment-grpo-000000/global_step_300/actor \
--target_dir ./merged_hf_model/Qwen25-Finetuned-Pair-300
```
### 4. Inference
#### Batch Inference for Single Image Quality Assessment
```bash
cd scripts/
python batch_image_inference.py \
--root_checkpoint /path/to/merged/hf/model \
--root_data /path/to/image/directory \
--data /path/to/data.csv \
--save_result_root ./results
```
#### Batch Inference for Pairwise Comparison with Voting
```bash
python batch_image_pair_inference_voting.py \
--root_checkpoint /path/to/merged/hf/model \
--root_data /path/to/image/directory \
--data /path/to/pair_data.csv \
--save_result_root ./results \
--n_prompt 64
```
***
## Project Structure
```
EvoQuality/
├── verl/
│ ├── examples/
│ │ └── ttrl/
│ │ └── Qwen-Pair/
│ │ └── train.sh # Training script
│ ├── scripts/
│ │ ├── batch_image_inference.py # Single image inference
│ │ └── batch_image_pair_inference_voting.py # Pairwise inference
│ ├── verl/
│ │ ├── workers/
│ │ │ └── reward_manager/
│ │ │ └── ttrl_pair.py # Pairwise reward manager
│ │ └── utils/
│ │ └── reward_score/
│ │ └── ttrl/ # TTRL metrics and utils
│ ├── setup.sh # Main setup script
│ └── setup.py # Package setup
└── README.md
```
***
## Citation
If you find EvoQuality useful for your research, please cite our paper:
```bibtex
@article{wen2025selfevolving,
title={Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking},
author={Wen, Wen and Zhi, Tianwu and Fan, Kanglong and Li, Yang and Peng, Xinge and Zhang, Yabin and Liao, Yiting and Li, Junlin and Zhang, Li},
journal={arXiv preprint arXiv:2509.25787},
year={2025}
}
```
***
## License Agreement
The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the [license](https://www.apache.org/licenses/LICENSE-2.0).
***
## Acknowledgements
We would like to thank the contributors to the [verl](https://github.com/verl-project/verl), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [transformers](https://github.com/huggingface/transformers), [vLLM](https://github.com/vllm-project/vllm), [PEFT](https://github.com/huggingface/peft) and [HuggingFace](https://huggingface.co) repositories, for their open research.