# zero **Repository Path**: xiaowenza/zero ## Basic Information - **Project Name**: zero - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-18 - **Last Updated**: 2024-10-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Frustratingly Easy Test-Time Adaptation of Vision-Language Models Code for the article "Frustratingly Easy Test-Time Adaptation of Vision-Language Models", arXiv, May 2024. [![arXiv](https://img.shields.io/badge/arXiv-2405.18330-b31b1b.svg)](https://arxiv.org/abs/2405.18330) Authors: [Matteo Farina](https://scholar.google.com/citations?user=SxQwDD8AAAAJ&hl=it&authuser=1), [Gianni Franchi](https://scholar.google.com/citations?hl=it&authuser=1&user=ZCW6-psAAAAJ), [Giovanni Iacca](https://scholar.google.com/citations?hl=it&authuser=1&user=qSw6YfcAAAAJ), [Massimiliano Mancini](https://scholar.google.com/citations?hl=it&authuser=1&user=bqTPA8kAAAAJ), [Elisa Ricci](https://scholar.google.com/citations?user=xf1T870AAAAJ&hl=it&authuser=1). ## Installation ### Dependencies We provide both pip requirements and a conda environment to install the dependencies of this repository, feel free to choose the one that better suits your needs. The code was tested with **python 3.11.9**. Install pip requirements: ``` pip install -r requirements.txt ``` Install with conda: ``` conda env create -f environment.yaml ``` ### Models The only model weights you need to download are MaPLe's pretrained initializations. For your convenience, we provide a script to download them automatically. Simply run: ``` ./scripts/download_maple.sh ``` You should now have a `weights` folder with the 3 MaPLe's ImageNet pretrainings provided by the authors (`weights/maple_seed1.pth`, `weights/maple_seed2.pth` and `weights/maple_seed3.pth`). Please check everything is in place. Should you have any problems, please download the weights from [this link](https://drive.google.com/drive/folders/18ISKsjc18e19Ov2nXOuH228FYBtgUa1O?usp=drive_link) and rename them accordingly. ### Datasets We strongly suggest you create a `datasets` folder under the root of this repository and store all datasets there. #### Natural Distribution Shifts For robustness to natural distribution shifts, we consider ImageNet-1k and 4 variants: 1. [ImageNet-A](https://github.com/hendrycks/natural-adv-examples). 2. [ImageNet-v2](https://github.com/modestyachts/ImageNetV2) (we use the validation set of the `MatchedFrequency` version) 3. [ImageNet-Sketch](https://github.com/HaohanWang/ImageNet-Sketch). 4. [ImageNet-R](https://github.com/hendrycks/imagenet-r). For all datasets simply download, extract and put them in the `./datasets` folder. You should have the following structure: ``` ./datasets/ | imagenet/ | | train/ | | | # class folders | | val/ | | | # class folders | imagenet-a/ | | # class folders | imagenet-r/ | | # class folders | imagenet-sketch | | # class folders | imagenetv2-matched-frequency-format-val | | # class folders (0 to 999) ``` #### Finegrained Classification For Finegrained classification, we adopt the same splits as [Zhou *et al*](https://arxiv.org/abs/2109.01134). Please refer to [this page](https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md#how-to-install-datasets) for the installation of all datasets and the JSON files for the splits. Once everything is downloaded, please organize everything as follows: ``` ./datasets/ | caltech-101/ | | images/ | | | # class folders | | split_zhou_Caltech101.json | dtd/ | | images/ | | | # class folders | | split_zhou_DescribableTextures.json | fgvc_aircraft/ | | images/ | | | # list of images | | # a bunch of txt files | flower102/ | | jpg/ | | | # list of images | | split_zhou_OxfordFlowers.json | food101/ | | images/ | | | # class folders | | split_zhou_Food101.json | oxford_pets/ | | images/ | | | # list of images | | split_zhou_OxfordPets.json | sun397/ | | images/ | | | # lettered folders ('a', 'b', 'c', etc.) | | split_zhou_SUN397.json | ucf101/ | | images/ | | | # class folders | | split_zhou_UCF101.json ``` **IMPORTANT**. By the time of developing this work, the official Stanford Cars' website was unreachable. Please download images from [this Kaggle page](https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset) and annotations from [this Drive link](https://drive.google.com/drive/folders/13QnEkFQ8nhzf3jxo0RKX7UQAjrtAnYpR?usp=drive_link). You should organize files as follows: ``` ./datasets/ | stanford_cars/ | | images/ | | | train/ | | | | # list of images | | | test/ | | | | # list of images | | annots/ | | | labels.csv | | | metadata.csv | | | split_coop.csv ``` ## Run The entrypoint for this repository is `run.py`. Please execute `python run.py --help` for a sense of the arguments. We provide different bash files in `scripts` to run different versions of `Zero`: 1. `zero.sh` runs Vanilla `Zero`; 2. `zero_rlcf.sh` runs the `Zero` variant with a smaller CLIP-ViT-B-16 and a larger CLIP-ViT-L-14; Note that the `--templates` flag activates the ensemble of textual templates (`+Ensemble` in Tab.1 and 2 of the article). The `--maple` flag uses a MaPLe pretraining (only available with CLIP-ViT-B-16). ## Citation If you find this work useful, please consider citing: ``` @article{farina2024frustratingly, title={Frustratingly Easy Test-Time Adaptation of Vision-Language Models}, author={Farina, Matteo and Franchi, Gianni and Iacca, Giovanni and Mancini, Massimiliano and Ricci, Elisa}, journal={arXiv preprint arXiv:2405.18330}, year={2024} } ``` ## Acknowledgements Parts of this repository are based on [TPT](https://github.com/azshue/TPT), [RLCF](https://github.com/mzhaoshuai/RLCF), [MaPLe](https://github.com/muzairkhattak/multimodal-prompt-learning) and [CoOp](https://github.com/KaiyangZhou/CoOp) repositories. Huge thx to all authors! ## Contacts Please do not hesitate to file an issue or to contact me at `m.farina@unitn.it`. I'll do my best to help!