# t2i-HPSv2
**Repository Path**: py-service/t2i-hpsv2
## Basic Information
- **Project Name**: t2i-HPSv2
- **Description**: https://github.com/tgxs002/HPSv2.git
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-03-04
- **Last Updated**: 2026-03-04
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README

# HPS v2: Benchmarking Text-to-Image Generative Models
[](https://pypi.org/project/hpsv2/)

[](https://arxiv.org/abs/2306.09341)
[](https://huggingface.co/spaces/xswu/HPSv2)
[](https://www.apache.org/licenses/LICENSE-2.0.html)
This is the official repository for the paper: [Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis](https://arxiv.org/abs/2306.09341).
## Updates
* [09/02/2024] We released HPS v2.1 model trained on higher quality datasets, and the training set of HPD v2.0. Happy new year!
* [08/02/2023] We released the [PyPI package](https://pypi.org/project/hpsv2/). You can learn how to use it from the [Quick start section](#quick-start).
* [08/02/2023] Updated [test.json](https://huggingface.co/datasets/zhwang/HPDv2/blob/main/test.json) to include raw annotation by each annotator.
* [07/29/2023] We included `SDXL Refiner 0.9` model in the benchmark.
* [07/29/2023] We released [the benchmark and HPD v2 test data](https://huggingface.co/datasets/zhwang/HPDv2). HPD v2 train data will be released soon.
* [07/27/2023] We included `SDXL Base 0.9` model in the benchmark.
* [07/26/2023] We updated our [compressed checkpoint](https://huggingface.co/spaces/xswu/HPSv2/resolve/main/HPS_v2_compressed.pt).
* [07/19/2023] Live demo is available at 馃[Hugging Face](https://huggingface.co/spaces/xswu/HPSv2).
* [07/18/2023] We released our [test data](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155172150_link_cuhk_edu_hk/EVnjOngvDO1MhIp7hVr8GXgBmxVDcSk7s9Xuu9srO4YLbA?e=8PqYud).
## Overview

**Human Preference Dataset v2 (HPD v2)**: a large-scale (798k preference choices / 430k images), a well-annotated dataset of human preference choices on images generated by text-to-image generative models.
**Human Preference Score v2 (HPS v2)**: a preference prediction model trained on HPD v2. HPS v2 can be used to compare images generated with the same prompt. We also provide a fair, stable, and easy-to-use set of evaluation prompts for text-to-image generative models.
## The HPS benchmark
The HPS benchmark evaluates models' capability of generating images of 4 styles: *Animation*, *Concept-art*, *Painting*, and *Photo*.
v2 benchmark
| Model | Animation | Concept-art | Painting | Photo | Averaged |
| ---------------------| --------- | ----------- | -------- | -------- | -------- |
| Dreamlike Photoreal 2.0 | 28.24 | 27.60 | 27.59 | 27.99 | 27.86 |
| SDXL Refiner 0.9 | 28.45 | 27.66 | 27.67 | 27.46 | 27.80 |
| Realistic Vision | 28.22 | 27.53 | 27.56 | 27.75 | 27.77 |
| SDXL Base 0.9 | 28.42 | 27.63 | 27.60 | 27.29 | 27.73 |
| Deliberate | 28.13 | 27.46 | 27.45 | 27.62 | 27.67 |
| ChilloutMix | 27.92 | 27.29 | 27.32 | 27.61 | 27.54 |
| MajicMix Realistic | 27.88 | 27.19 | 27.22 | 27.64 | 27.48 |
| Openjourney | 27.85 | 27.18 | 27.25 | 27.53 | 27.45 |
| DeepFloyd-XL | 27.64 | 26.83 | 26.86 | 27.75 | 27.27 |
| Epic Diffusion | 27.57 | 26.96 | 27.03 | 27.49 | 27.26 |
| Stable Diffusion v2.0 | 27.48 | 26.89 | 26.86 | 27.46 | 27.17 |
| Stable Diffusion v1.4 | 27.26 | 26.61 | 26.66 | 27.27 | 26.95 |
| DALL路E 2 | 27.34 | 26.54 | 26.68 | 27.24 | 26.95 |
| Versatile Diffusion | 26.59 | 26.28 | 26.43 | 27.05 | 26.59 |
| CogView2 | 26.50 | 26.59 | 26.33 | 26.44 | 26.47 |
| VQGAN + CLIP | 26.44 | 26.53 | 26.47 | 26.12 | 26.39 |
| DALL路E mini | 26.10 | 25.56 | 25.56 | 26.12 | 25.83 |
| Latent Diffusion | 25.73 | 25.15 | 25.25 | 26.97 | 25.78 |
| FuseDream | 25.26 | 25.15 | 25.13 | 25.57 | 25.28 |
| VQ-Diffusion | 24.97 | 24.70 | 25.01 | 25.71 | 25.10 |
| LAFITE | 24.63 | 24.38 | 24.43 | 25.81 | 24.81 |
| GLIDE | 23.34 | 23.08 | 23.27 | 24.50 | 23.55 |
v2.1 benchmark
| Model | Animation | Concept-art | Painting | Photo | Averaged |
| ---------------------| --------- | ----------- | -------- | -------- | -------- |
| SDXL Refiner 0.9 | 33.26 | 32.07 | 31.63 | 28.38 | 31.34 |
| SDXL Base 0.9 | 32.84 | 31.36 | 30.86 | 27.48 | 30.63 |
| Deliberate | 31.46 | 30.48 | 30.17 | 28.83 | 30.23 |
| Realistic Vision | 31.01 | 29.95 | 30.00 | 28.61 | 29.89 |
| Dreamlike Photoreal 2.0 | 30.87 | 29.75 | 29.46 | 28.85 | 29.73 |
| MajicMix Realistic | 29.67 | 28.50 | 28.44 | 28.02 | 28.66 |
| ChilloutMix | 29.46 | 28.46 | 28.35 | 27.63 | 28.47 |
| Openjourney | 28.37 | 27.38 | 27.53 | 26.66 | 27.48 |
| DeepFloyd-XL | 27.71 | 26.07 | 25.79 | 27.96 | 26.88 |
| Epic Diffusion | 27.07 | 26.14 | 26.17 | 26.43 | 26.45 |
| Stable Diffusion v2.0 | 27.09 | 26.02 | 25.68 | 26.73 | 26.38 |
| Stable Diffusion v1.4 | 26.03 | 24.87 | 24.80 | 25.70 | 25.35 |
| DALL路E 2 | 26.38 | 24.51 | 24.93 | 25.55 | 25.34 |
| Versatile Diffusion | 23.69 | 23.39 | 24.02 | 24.64 | 23.93 |
| CogView2 | 23.64 | 24.86 | 23.40 | 22.68 | 23.64 |
| VQGAN + CLIP | 22.55 | 23.76 | 23.41 | 21.51 | 22.81 |
| DALL路E mini | 21.54 | 20.50 | 20.32 | 21.72 | 21.02 |
| Latent Diffusion | 20.63 | 19.65 | 19.79 | 21.26 | 20.34 |
| FuseDream | 19.16 | 19.37 | 19.07 | 20.07 | 19.42 |
| VQ-Diffusion | 18.44 | 18.31 | 19.24 | 20.62 | 19.15 |
| LAFITE | 17.79 | 17.55 | 17.61 | 20.88 | 18.46 |
| GLIDE | 13.90 | 13.50 | 13.94 | 16.72 | 14.51 |
## Quick Start
### Installation
```shell
# Method 1: Pypi download and install
pip install hpsv2
# Method 2: install locally
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2
pip install -e .
# Optional: images for reproducing our benchmark will be downloaded here
# default: ~/.cache/hpsv2/
export HPS_ROOT=/your/cache/path
```
After installation, we show how to:
- [Compare images using HPS v2](#image-comparison).
- [Reproduce our benchmark](#benchmark-reproduction).
- [Evaluate your own model using HPS v2](#custom-evaluation).
- [Evaluate our preference model](#preference-model-evaluation).
We also provide [command line interfaces](#command-line-interface) for debugging purposes.
### Image Comparison
You can score and compare several images generated by the same prompt by running the following code:
```python
import hpsv2
# imgs_path can be a list of image paths, with the images generated by the same prompt
# or image path of string type
# or image of PIL.Image.Image type
result = hpsv2.score(imgs_path, '', hps_version="v2.1")
```
**Note**: Comparison is only meaningful for images generated by the **same prompt**. You can also pass "v2.0" to hps_version to use our updated model. Scores can not be directly compared between v2.0 and v2.1.
### Benchmark Reproduction
We also provide [images](https://huggingface.co/datasets/zhwang/HPDv2/tree/main/benchmark/benchmark_imgs) generated by models in our [benchmark](#the-hps-v2-benchmark) used for evaluation. You can easily download the data and evaluate the models by running the following code.
```python
import hpsv2
print(hpsv2.get_available_models()) # Get models that have access to data
hpsv2.evaluate_benchmark('')
```
### Custom Evaluation
To evaluate your own text-to-image generative model, you can prepare the images for evaluation base on the [benchmark prompts](https://huggingface.co/datasets/zhwang/HPDv2/tree/main/benchmark) we provide by running the following code:
```python
import os
import hpsv2
# Get benchmark prompts (