# CARZero **Repository Path**: wvelong/CARZero ## Basic Information - **Project Name**: CARZero - **Description**: CARZero-soft-label - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-04-20 - **Last Updated**: 2024-10-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification CARZero (**C**ross-attention **A**lignment for **R**adiology **Zero**-Shot Classification) is a pioneering multimodal representation learning framework designed for enhancing medical image recognition with minimal labeling effort. It leverages the power of cross-attention mechanisms to intelligently align disparate data modalities, facilitating label-efficient learning and accurate classification in the medical imaging domain. >**[CARZero Manuscript](http://arxiv.org/abs/2402.17417)** \ > Haoran Lai, Qingsong Yao, Zihang Jiang. Rongsheng Wang, Zhiyang He, Xiaodong Tao, S. Kevin Zhou
> University of Science and Technology of China
> IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
## Approach ![CARZero](CARZero.png) ## Dataset Overview ### Training Data - **MIMIC-CXR**: [PhysioNet](https://physionet.org/content/mimic-cxr/1.0.0/) - Contains 377,110 images from 227,835 radiographic studies of 65,379 patients. ### Inference Data - **Open-I**: [Open-I's FAQ page](https://openi.nlm.nih.gov/faq) - Features 3,851 reports and 7,470 chest X-ray images, annotated for 18 multi-label diseases. - **PadChest**: [BIMCV](https://bimcv.cipf.es/bimcv-projects/padchest/) - Includes 160,868 chest X-ray images of 192 different diseases, with 27% manually annotated. - **ChestXray14**: [NIH's Box storage](https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/37178474737) - Comprises 112,120 images with 14 disease labels from 30,805 patients. - **CheXpert**: [Stanford ML Group](https://stanfordmlgroup.github.io/competitions/chexpert/) - Consists of 224,316 chest X-rays from 65,240 patients, with a consensus-annotated test set of 500 patients. - **ChestXDet10**: [GitHub](https://github.com/Deepwise-AILab/ChestX-Det10-Dataset) - A subset of NIH ChestXray14, containing 3,543 images with box-level annotations for 10 diseases. ### Data Pre-processing File names are saved in the `Dataset` folder, which can be dowload in [here](https://drive.google.com/drive/folders/1Oubkx6ZQqmK5bTwVXhReHDyhz3Ms1vzF?usp=drive_link) Please update the PATH in the filename list to your image storage location for integration. ## Pretraining Model For the image encoder, Vit/B-16 is utilized, pre-trained using [MAE](https://github.com/RL4M/MRM-pytorch) and [M3AE](https://github.com/zhjohnchan/M3AE) techniques. Our pre-trained models are available [here](https://drive.google.com/file/d/1QJvtatLuIlYqi-V1DjgHnACM2kq2C-ET/view?usp=sharing) for download in the `./pretrain_model` folder. For text encoding, BioBert is fine-tuned on MIMIC and Padchest reports and available through Hugging Face: ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Laihaoran/BioClinicalMPBERT") model = AutoModel.from_pretrained("Laihaoran/BioClinicalMPBERT") ``` ## Getting Started Start by [installing PyTorch 1.12.1](https://pytorch.org/get-started/locally/) with the right CUDA version, then clone this repository and install the dependencies. ```bash $ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch $ pip install git@github.com:laihaoran/CARZero.git $ conda env create -f environment.yml ``` ### Zeroshot classification for mutil-label datasets (OpenI PadChest ChestXray14 CheXpert ChestXDet10) If you want to test the performance of our model, you can dowload the trained [CARZero](https://drive.google.com/file/d/1kYF-k5otW5DHwz1En5d_ScV3zu2E27Ch/view?usp=sharing) and place is in the `./pretrain_model` folder. Run the script file for test. ```bash sh test.sh ``` ## Training Example configurations for pretraining can be found in the `./configs`. All training are done using the `run.py` script. Run the script file for training: ```bash sh train.sh ``` **Note:** The batch size increases sequentially from 64, to 128, and finally to 256. It is advisable to remember this progression and utilize a GPU equipped with 80 GB of memory to accommodate these changes efficiently. ### Citation ``` @article{lai2024carzero, title={CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification}, author={Lai, Haoran and Yao, Qingsong and Jiang, Zihang and Wang, Rongsheng and He, Zhiyang and Tao, Xiaodong and Zhou, S Kevin}, journal={arXiv preprint arXiv:2402.17417}, year={2024} } ``` #### Acknowledgements Our sincere thanks to the contributors of [MAE](https://github.com/RL4M/MRM-pytorch), [M3AE](https://github.com/zhjohnchan/M3AE), and [BERT](https://github.com/dhlee347/pytorchic-bert) for their foundational work, which greatly facilitated the development of CARZero.