# ReTA **Repository Path**: xiaowenza/ReTA ## Basic Information - **Project Name**: ReTA - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-11 - **Last Updated**: 2025-09-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ReTA 🎉 Our paper **"Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations"** has been accepted at **ACM MM 2025**! 🔗**Link**: [https://arxiv.org/abs/2507.09500](https://arxiv.org/abs/2507.09500) ### 📚 Overview

ReTA Pipeline Overview

### 🛠️ Environment ```bash conda create -n reta python=3.8 -y conda activate reta pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116 pip install -r requirements.txt ``` ### 🗂️ Datasets Please follow [DATASETS.md](https://github.com/kdiAAA/TDA/blob/main/docs/DATASETS.md) to prepare the datasets and annotations. Our dataset organization example is as follows: ``` TTA_Data/ ├─ imagenet/ │ ├─ classnames.txt │ └─ images/ │ └─ val/ ├─ imagenet-adversarial/ │ ├─ classnames.txt │ └─ imagenet-a ├─ imagenet-rendition/ │ ├─ classnames.txt │ └─ imagenet-r ├─ imagenet-sketch/ │ ├─ classnames.txt │ └─ images/ ├─ imagenetv2/ │ ├─ classnames.txt │ └─ imagenetv2-matched-frequency-format-val ├─ caltech-101/ │ ├─ split_zhou_Caltech101.json │ └─ 101_ObjectCategories/ ├─ dtd/ │ ├─ split_zhou_DescribableTextures.json │ ├─ images/ │ └─ labels/ ├─ eurosat/ │ ├─ split_zhou_EuroSAT.json │ └─ 2750/ ├─ fgvc_aircraft/ │ ├─ images/ │ └─ images_variant_test.txt ├─ food-101/ │ ├─ images/ │ └─ split_zhou_Food101.json ├─ oxford_flowers/ │ ├─ jpg/ │ ├─ imagelabels.mat │ ├─ cat_to_name.json │ └─ split_zhou_OxfordFlowers.json ├─ oxford_pets/ │ ├─ split_zhou_OxfordPets.json │ ├─ images/ │ └─ annotations/ ├─ stanford_cars/ │ ├─ cars_test/ │ └─ split_zhou_StanfordCars.json ├─ sun397/ │ ├─ SUN397/ │ └─ split_zhou_SUN397.json └─ ucf101/ ├─ UCF-101-midframes/ └─ split_zhou_UCF101.json ``` ### 🚀 Quick Start To reproduce the results reported in **Tables 1** and **Table 2** of our paper, run the following commands: ```bash # OOD benchmark (ViT-B/16) bash ./scripts/run_ood_benchmark_vit.sh # OOD benchmark (RN50) bash ./scripts/run_ood_benchmark_rn50.sh # Cross-domain benchmark (ViT-B/16) bash ./scripts/run_cd_benchmark_vit.sh # Cross-domain benchmark (RN50) bash ./scripts/run_cd_benchmark_rn50.sh ``` Hyperparameter configurations are in `configs/`. In these config files, `align` corresponds to $\lambda_2$ and train_w corresponds to $\lambda_1$ in Eq. (18). For other hyperparameters, please refer to [TDA](https://github.com/kdiAAA/TDA) and [DPE](https://github.com/zhangce01/DPE-CLIP/tree/main). ### 🤗 Acknowledgements Our work benefited from the public code and dataset instructions of [TPT](https://github.com/azshue/TPT), [CoOp](https://github.com/KaiyangZhou/CoOp), [TDA](https://github.com/kdiAAA/TDA) and [DPE](https://github.com/zhangce01/DPE-CLIP/tree/main). Thanks to the authors for open-sourcing implementations and providing clear setup guides. ### 📌 Citation If you find this code helpful, kindly consider citing: ``` @article{liang2025advancing, title={Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations}, author={Liang, Yiwen and Chen, Hui and Xiong, Yizhe and Zhou, Zihan and Lyu, Mengyao and Lin, Zijia and Niu, Shuaicheng and Zhao, Sicheng and Han, Jungong and Ding, Guiguang}, journal={arXiv preprint arXiv:2507.09500}, year={2025} } ```