# Kris_Bench

**Repository Path**: stfocus/Kris_Bench

## Basic Information

- **Project Name**: Kris_Bench
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-17
- **Last Updated**: 2025-09-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# KRIS-Bench Evaluation

This directory contains automated evaluation scripts for the KRIS-Bench.

## Directory Structure

- `metrics_common.py`: Evaluation script for tasks without knowledge plausibility.
- `metrics_view_change.py`: Evaluation script for Viewpoint Change task with ground truth image.
- `metrics_knowldge.py`: Evaluation script for tasks with knowledge plausibility.
- `metrics_multi_element.py`  Evaluation script for multi-elment composition task.
- `metrics_temporal_prediction.py` Evaluation script for temporal prediction tasks.
- `results/`: Directory for editing and evaluation results.
- `KRIS_Bench/`: Benchmark dataset directory, should contain `annotation.json` and original images for each category.

## Requirements
- Python 3.8+

Set your OpenAI API Key as an environment variable before running:

```bash
export OPENAI_API_KEY=your_openai_api_key
```

## Evaluation Metrics
- Visual Consistency
- Visual Quality
- Instruction Following
- Knowledge Plausibility

## Usage

First, download the benchmark. We upload the whole benchmark to the [Hugging Face repository](https://huggingface.co/datasets/Liang0223/KRIS_Bench). For convenience, we also keep the benchmark in this repository. You can find the dataset in [KRIS_Bench](./KRIS_Bench).

Run the main evaluation script from the command line:

```bash
python metrics_common.py --models doubao gpt gemini
```

Arguments:

- `--models`: List of model names to evaluate

The script will automatically iterate over the specified models and categories, call GPT-4o for automated evaluation, and save results to `results/{model}/{category}/metrics.json`.

## Output Format

Each category will have a `metrics.json` file with the following structure:

```json
{
  "1": {
    "instruction": "...",
    "explain": "...",
    "consistency_score": 5,
    "consistency_reasoning": "...",
    "instruction_score": 5,
    "instruction_reasoning": "...",
    "quality_score": 4,
    "quality_reasoning": "..."
  },
  ...
}
```

## Notes

- Ensure `KRIS_Bench/{category}/annotation.json` and original images exist.
- Ensure model-generated images are present in `results/{model}/{category}/` and named as `{image_id}.jpg`.
- OpenAI API usage is subject to rate limits and costs. Adjust `max_workers` and batch size as needed.