# YOLO-UniOW **Repository Path**: summry/YOLO-UniOW ## Basic Information - **Project Name**: YOLO-UniOW - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-06 - **Last Updated**: 2025-03-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: Yolo ## README # YOLO-UniOW: Efficient Universal Open-World Object Detection The official implementation of **YOLO-UniOW** [[`arxiv`](https://arxiv.org/abs/2412.20645)] ![yolo-uniow](./assets/yolo-uniow.jpg) ## Zero-shot Performance on LVIS Dataset YOLO-UniOW-S/M/L has been pre-trained from scratch and evaluated on the `LVIS minival`. The pre-trained weights can be downloaded from the link provided below. | Model | #Params | AP^{mini | AP_r | AP_c | AP_f | FPS (V100) |
| :----------------------------------------------------------: | :-----: | :------------------: | :-------------: | :-------------: | :-------------: | :--------: |
| [YOLO-UniOW-S](https://huggingface.co/leonnil/yolo-uniow/resolve/main/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth) | 7.5M | 26.2 | 24.1 | 24.9 | 27.7 | 98.3 |
| [YOLO-UniOW-M](https://huggingface.co/leonnil/yolo-uniow/resolve/main/yolo_uniow_m_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth) | 16.2M | 31.8 | 26.0 | 30.5 | 34 | 86.2 |
| [YOLO-UniOW-L](https://huggingface.co/leonnil/yolo-uniow/resolve/main/yolo_uniow_l_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth) | 29.4M | 34.6 | 30.0 | 33.6 | 36.3 | 64.8 |

## Experiment Setup

### Data Preparation

For preparing open-vocabulary and open-world data, please refer to [docs/data](./docs/data.md).

### Installation

Our model is built with **CUDA 11.8** and **PyTorch 2.1.2**. To set up the environment, refer to the [PyTorch official documentation](https://pytorch.org/get-started/locally/) for installation guidance. For detailed instructions on installing `mmcv`, please see [docs/installation](./docs/installation.md).

```bash
conda create -n yolouniow python=3.9
conda activate yolouniow
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install -r requirements.txt
pip install -e .
```

### Training & Evaluation

For open-vocabulary model training and evaluation, please refer to `run_ovod.sh`

```bash
# Train Open-Vocabulary Model
./tools/dist_train.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp

# Evaluate Open-Vocabulary Model
./tools/dist_test.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py \
pretrained/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth 8
```

For open-world model training and evaluation, please follow the steps provided in `run_owod.sh`. Ensure that the model is trained before proceeding with the evaluation. We provide our fine-tuned wildcard features, [object_tuned_s](https://huggingface.co/leonnil/yolo-uniow/resolve/main/object_tuned_s.npy) and [object_tuned_m](https://huggingface.co/leonnil/yolo-uniow/resolve/main/object_tuned_m.npy), obtained through steps 2 and 3, allowing for direct use.

```bash
# 1. Extract text/wildcard features from pretrained model
python tools/owod_scripts/extract_text_feats.py --config $CONFIG --ckpt $CHECKPOINT --save_path $EMBEDS_PATH

# 2. Fine-tune wildcard features
./tools/dist_train.sh $OBJ_CONFIG 8 --amp

# 3. Extract fine-tuned wildcard features
python tools/owod_scripts/extract_text_feats.py --config $OBJ_CONFIG --save_path $EMBEDS_PATH --extract_tuned

# 4. Train all owod tasks
python tools/owod_scripts/train_owod_tasks.py MOWODB $OW_CONFIG $CHECKPOINT

# 5. Evaluate all owod tasks
python tools/owod_scripts/test_owod_tasks.py MOWODB $OW_CONFIG --save
```

To train and evaluate on specific datasets and tasks, use the commands below:

```bash
# Train owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_train_owod.sh $CONFIG 8 --amp

# Evaluate owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_test.sh $CONFIG $CHECKPOINT 8
```

## Acknowledgement

This project builds upon [YOLO-World](https://github.com/AILab-CVC/YOLO-World), [YOLOv10](https://github.com/Trami1995/YOLOv10), [FOMO](https://github.com/orrzohar/FOMO), and [OVOW](https://github.com/343gltysprk/ovow/). We sincerely thank the authors for their excellent implementations!

## Citation

If our code or models help your work, please cite our paper and yolov10:
```bash
@article{liu2024yolouniow,
title={YOLO-UniOW: Efficient Universal Open-World Object Detection},
author={Liu, Lihao and Feng, Juexiao and Chen, Hui and Wang, Ao and Song, Lin and Han, Jungong and Ding, Guiguang},
journal={arXiv preprint arXiv:2412.20645},
year={2024}
}

@article{wang2024yolov10,
title={YOLOv10: Real-Time End-to-End Object Detection},
author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
journal={arXiv preprint arXiv:2405.14458},
year={2024}
}
```}