# weclip+mamba

**Repository Path**: furoc/weclip-mamba

## Basic Information

- **Project Name**: weclip+mamba
- **Description**: 12345678910
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-09-04
- **Last Updated**: 2026-01-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## Frozen CLIP-DINO: A Strong Backbone for Weakly Supervised Semantic Segmentation (TPAMI 2025)

Code of TPAMI 2025 paper: Frozen CLIP-DINO: A Strong Backbone for Weakly Supervised Semantic Segmentation.

[[Paper]](https://ieeexplore.ieee.org/document/10891864) [[Project]](https://github.com/zbf1991/WeCLIP) 


This Project heavily relies on the [[AFA]](https://github.com/rulixiang/afa) and [[CLIP-ES]](https://github.com/linyq2117/CLIP-ES). Many thanks for their great work!
## Preparations

### VOC dataset

#### 1. Download

``` bash
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar
```
#### 2. Download the augmented annotations
The augmented annotations are from [SBD dataset](http://home.bharathh.info/pubs/codes/SBD/download.html). Here is a download link of the augmented annotations at
[DropBox](https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0). After downloading ` SegmentationClassAug.zip `, you should unzip it and move it to `VOCdevkit/VOC2012`. The directory sctructure should thus be 

``` bash
VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject
```

### COCO dataset

#### 1. Download
``` bash
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
```
After unzipping the downloaded files, for convenience, I recommand to organizing them in VOC style.

``` bash
MSCOCO/
├── JPEGImages
│    ├── train
│    └── val
└── SegmentationClass
     ├── train
     └── val
```

#### 2. Generating VOC style segmentation labels for COCO
To generate VOC style segmentation labels for COCO dataset, you could use the scripts provided at this [repo](https://github.com/alicranck/coco2voc). Or, just downloading the generated masks from [Google Drive](https://drive.google.com/file/d/1pRE9SEYkZKVg0Rgz2pi9tg48j7GlinPV/view).

### Create and activate conda environment

```bash
conda create --name py38 python=3.8
conda activate py38
pip install -r requirments.txt
```


### Download Pre-trained CLIP-VIT/16 Weights

Download the pre-trained CLIP-VIT/16 weights from the official [link](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt).

Then, move this model to `pretrained/`.


### Modify the config
Three parameters requires to be modified based on your path:

(1) root_dir: `your/path/VOCdevkit/VOC2012` or `your/path/MSCOCO`

(2) name_list_dir: `your/path/WeCLIP+/datasets/voc` or `your/path/WeCLIP+/datasets/coco`

(3) clip_pretrain_path: `your/path/WeCLIP+/pretrained/ViT-B-16.pt`

For VOC, Modify them in `configs/voc_attn_reg.yaml`.

For VOC, Modify them in `configs/coco_attn_reg.yaml`.

The default DINO bakcbone is ViT-s, for using differnet DINO backbones, uncomment the corresponding part in `configs/voc_attn_reg.yaml` and `configs/coco_attn_reg.yaml`.

### Train
To start training, just run the following code.
```bash
# train on voc
python scripts/dist_clip_voc.py --config your/path/WeCLIP+/configs/voc_attn_reg.yaml
# train on coco
python scripts/dist_clip_coco.py --config your/path/WeCLIP+/configs/coco_attn_reg.yaml
```

### Inference
To inference, first modify the inference model path `--model_path` in `test_msc_flip_voc` or `test_msc_flip_coco`

Then, run the following code:
```bash
# inference on voc
python test_msc_flip_voc.py --model_path your/inference/model/path/WeCLIP_model_iter_30000.pth
# inference on coco
python test_msc_flip_coco.py --model_path your/inference/model/path/WeCLIP_model_iter_80000.pth
``` 


## Citation
Please kindly cite our paper if you find it's helpful in your work.

``` bibtex
@ARTICLE{10891864,
  author={Zhang, Bingfeng and Yu, Siyue and Xiao, Jimin and Wei, Yunchao and Zhao, Yao},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Frozen CLIP-DINO: a Strong Backbone for Weakly Supervised Semantic Segmentation}, 
  year={2025},
  volume={},
  number={},
  pages={1-17},
  keywords={Training;Semantics;Decoding;Semantic segmentation;Feature extraction;Dogs;Transformers;Costs;Pipelines;Hands;Weakly supervised;semantic segmentation;CLIP;DINO},
  doi={10.1109/TPAMI.2025.3543191}}
```

## Ackonwledge
Many thanks for AFA: [[paper]](https://arxiv.org/abs/2203.02664) [[Project]](https://rulixiang.github.io/afa)
``` bibtex
@inproceedings{ru2022learning,
    title = {Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers},
    author = {Lixiang Ru and Yibing Zhan and Baosheng Yu and Bo Du}
    booktitle = {CVPR},
    year = {2022},
  }
```

Many thanks for CLIP-ES: [[paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Lin_CLIP_Is_Also_an_Efficient_Segmenter_A_Text-Driven_Approach_for_CVPR_2023_paper.html) [[Project]](https://github.com/linyq2117/CLIP-ES)
``` bibtex
@InProceedings{Lin_2023_CVPR,
    author    = {Lin, Yuqi and Chen, Minghao and Wang, Wenxiao and Wu, Boxi and Li, Ke and Lin, Binbin and Liu, Haifeng and He, Xiaofei},
    title     = {CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {15305-15314}
}
```