# VELM

**Repository Path**: scauzhijun/VELM

## Basic Information

- **Project Name**: VELM
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-16
- **Last Updated**: 2025-05-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# VELM (Vision Expert + Language Model): A framework for Anomaly Classification (AC)

Repository provides the source code for the paper "Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models":

### [arXiv](https://arxiv.org/abs/2505.02626)

![teaser](figures/framework.jpg) 

## Overview

The project supports multiple LLM backends including GPT-4o, GPT-4o-mini, and Qwen2-VL. VELM enables anomaly classification by leveraging the visual understanding capabilities of multimodal LLMs. The system can:

- Process images of industrial objects
- Detect and localize anomalies using a vision expert
- Generate red line contour based on anomaly localization
- Classify different types of anomaly 
- Distinguish between negligible anomalies and critical defects
- Evaluate model performance using various metrics

## Repository Structure

```
VELM/
├── configs/                  # Configuration files
│   ├── prompts/              # Preprocessed prompts for different datasets
│   ├── predictions/          # Model predictions
|   ├── evaluations/          # Evaluation results
│   ├── mvtec_ad_des.json     # MVTec-AD dataset descriptions
│   ├── mvtec_ac_des.json     # MVTec-AC dataset descriptions
│   └── visa_ac_des.json      # VisA-AC dataset descriptions
├── datasets/                 # Datasets (download links are provided)
│   ├── mvtec_ad              # MVTec-AD dataset
│   ├── mvtec_ac              # MVTec-AC dataset
│   └── visa_ac               # VisA-AC dataset
├── utils.py                  # Common utility functions
├── ddad_reorganizer.py       # Match the output of DDAD to the expected directory structure
├── create_contour.py         # Script to draw contour lines on the query image based on detected anomalies
├── generate_prompts.py       # Script to preprocess prompts
├── run_llm_hm.py             # Script to run LLM models with heatmap visualization
├── eval.py                   # Script to evaluate model predictions
└── anomaly_vs_defect.py      # Script to evaluate negligible anomaly vs. critical defect classification
```

## Installation

1. Clone the repository:
```bash
git clone https://github.com/Sassanmtr/VELM.git
cd VELM
```

2. Install dependencies:
```bash
conda create --name velm_env python=3.9
conda activate velm_env
pip install -r requirements.txt
```

3. (For experiments with GPT) Set up environment variables:
Create a `.env` file in the root directory with your API keys:
```
OPENAI_API_KEY=your_openai_api_key
```

## Datasets

The framework supports the following datasets:

- **MVTec-AD**: A dataset for unsupervised anomaly detection
- **MVTec-AC**: A dataset for anomaly classification
- **VisA-AC**: A dataset for anomaly classification

### Download and Setup Instructions

- [MVTec-AD](https://www.mvtec.com/company/research/datasets/mvtec-ad) Download and place in the `datasets/mvtec_ad` folder  
- [MVTec-AC](https://drive.google.com/drive/folders/1R_rZgZbHEF9byic84zdlWezECtmUk4na?usp=sharing) Download and place in the `datasets/mvtec_ac` folder

**Note:** MVTec-AC uses the same training set as MVTec-AD. You can copy the train folder from mvtec_ad to mvtec_ac if needed


- [VisA-AC](https://drive.google.com/drive/folders/1cpF_yJD0cOIQoyx1egf1V4sGfvMpLTLn?usp=sharing) Download and place in the `datasets/visa_ac` folder

 **Note:** VisA-AC uses the same training set as VisA. If VisA is already downloaded, you can reuse its train folder.
 

## Usage

### Create Contour Images
Generate red contour lines from heatmaps to overlay on test images:

```bash
python create_contour.py --dataset mvtec_ac --image_size 448
```
Options:
- `--config`: Path to YAML configuration file (default: `configs/contour_config.yaml`)
- `--dataset`: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)-overrides config
- `image_size`: Resize inout images and heatmaps to this size (default: 448)

**Note:** To use heatmaps generated by other methods, update the `heatmap_dir` path in the configuration file accordingly

### Generating Prompts

Generate prompts for the LLM to perform anomaly classification:

```bash
python generate_prompts.py --dataset mvtec_ac --text_type conditioned --ddad_format True
```

Options:
- `--dataset`: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)
- `--text_type`: Type of text to generate (raw: reference and query images, conditioned: reference, contour, and query images)
- `--ddad_format`: Whether to use DDAD format (True/False)

### Running LLM Models

Run LLM models for anomaly detection with heatmap visualization:

```bash
# Using GPT-4o (default)
python run_llm.py --model gpt --dataset mvtec_ad --heatmap_mode contour

# Using GPT-4o-mini
python run_llm.py --model gpt --gpt_model gpt-4o-mini --dataset mvtec_ad --heatmap_mode contour

# Using Qwen2-VL
python run_llm.py --model qwen --dataset mvtec_ac --heatmap_mode contour
```

Options:
- `--model`: Model to use (gpt, qwen)
- `--gpt_model`: GPT model to use (gpt-4o, gpt-4o-mini) - only applicable if model=gpt
- `--dataset`: Dataset to use (mvtec_ad, mvtec_ac, visa_ac)
- `--heatmap_mode`: Heatmap visualization mode (contour, none)
- `--image_size`: Size to resize images to (default: 448)
- `--num_ref`: Number of reference images to use (default: 1)

### Evaluating Model Performance

Evaluate model predictions:

```bash
python eval.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contour
```

Options:
- `--dataset`: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)
- `--model`: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)
- `--heatmap_mode`: Heatmap visualization mode (contour, none)
- `--output`: Path to save evaluation results (optional)
- `--verbose`: Enable verbose logging

### Anomaly vs. Defect Classification

Evaluate model performance in distinguishing between critical defects and negligible anomalies:

```bash
python anomaly_vs_defect.py --dataset mvtec_ad --model gpt-4o --heatmap_mode contour
```

Options:
- `--dataset`: Dataset to evaluate (mvtec_ad, mvtec_ac, visa_ac)
- `--model`: Model type used for predictions (gpt-4o, gpt-4o-mini, qwen)
- `--heatmap_mode`: Heatmap visualization mode (contour, none)
- `--seeds`: Number of random seeds to use for evaluation (default: 5)
- `--output`: Path to save evaluation results (default: anom_def_results.json)
- `--verbose`: Enable verbose logging


## Results

Evaluation results are saved in JSON format and include:

- Accuracy per object category
- Standard deviation of accuracy
- Overall accuracy metrics
- Confusion matrices

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Citation

```
@article{mokhtar2025detect,
  title={Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models},
  author={Mokhtar, Sassan and Mousakhan, Arian and Galesso, Silvio and Tayyub, Jawad and Brox, Thomas},
  journal={arXiv preprint arXiv:2505.02626},
  year={2025}
}
```

## Feedback

For any feedback or inquiries, please contact sassan.mtr@gmail.com