# GreenMIM **Repository Path**: wangjialei1998/GreenMIM ## Basic Information - **Project Name**: GreenMIM - **Description**: 阿斯蒂芬阿斯蒂芬按时大苏打 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-06-02 - **Last Updated**: 2024-10-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GreenMIM This is the official PyTorch implementation of the paper [Green Hierarchical Vision Transformer for Masked Image Modeling](https://arxiv.org/abs/2205.13515).

Group Attention Scheme.

Method Overview.

## Citation If you find our work interesting or use our code/models, please cite: ```bibtex @article{huang2022green, title={Green Hierarchical Vision Transformer for Masked Image Modeling}, author={Huang, Lang and You, Shan and Zheng, Mingkai and Wang, Fei and Qian, Chen and Yamasaki, Toshihiko}, journal={arXiv preprint arXiv:2205.13515}, year={2022} } ``` ## Catalogs - [x] Pre-trained checkpoints - [x] Pre-training code - [x] Fine-tuning code ## Pre-trained Models
Swin-Base (Window 7x7) Swin-Base (Window 14x14) Swin-Large (Window 14x14)
pre-trained checkpoint Download Download Download
## Pre-training The pre-training scripts are given in the `scripts/` folder. The scripts with names start with 'run*' are for non-slurm users while the others are for slurm users. #### For Non-Slurm Users To train a Swin-B with on a single node with 8 GPUs. ```bash PORT=23456 NPROC=8 bash scripts/run_mae_swin_base.sh ``` #### For Slurm Users To train a Swin-B with on a single node with 8 GPUs. ```bash bash scripts/srun_mae_swin_base.sh [Partition] [NUM_GPUS] ``` Instructions for non-slurm users will be available soon. ## Fine-tuning on ImageNet-1K | Model | #Params | Pre-train Resolution | Fine-tune Resolution | Config | Acc@1 (%) | | :---- | ------- | -------------------- | -------------------- | ------ | --------- | | Swin-B (Window 7x7) | 88M | 224x224 | 224x224 | [Config](ft_configs/greenmim_finetune_swin_base_img224_win7.yaml) | 83.7 | | Swin-L (Window 14x14) | 197M | 224x224 | 224x224 | [Config](ft_configs/greenmim_finetune_swin_large_img224_win14.yaml) | 85.1 | Currently, we directly use the code of [SimMIM](https://github.com/microsoft/SimMIM) for fine-tuning, please follow [their instructions](https://github.com/microsoft/SimMIM#fine-tuning-pre-trained-models) to use the configs. NOTE that, due to the limited computing resource, we use a batch size of 1024 (128 x 8) for Swin-B and a batch size of 768 (48 x 16) for fine-tuning. # Acknowledgement This code is based on the implementations of [MAE](https://github.com/facebookresearch/mae), [SimMIM](https://github.com/microsoft/SimMIM), [BEiT](https://github.com/microsoft/unilm/tree/master/beit), [SwinTransformer](https://github.com/microsoft/Swin-Transformer), and [DeiT](https://github.com/facebookresearch/deit). # License This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.