# MAE-code **Repository Path**: UBCDingXin/MAE-code ## Basic Information - **Project Name**: MAE-code - **Description**: Pytorch implementation of Masked Auto-Encoder - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2023-06-14 - **Last Updated**: 2023-06-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Masked Auto-Encoder (MAE) Pytorch implementation of Masked Auto-Encoder: * Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick. [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377v1). arXiv 2021.

## Usage 1. Clone to the local. ``` > git clone https://github.com/liujiyuan13/MAE-code.git MAE-code ``` 2. Install required packages. ``` > cd MAE-code > pip install requirements.txt ``` 3. Prepare datasets. - For *Cifar10*, *Cifar100* and *STL*, skip this step for it will be done automatically; - For *ImageNet1K*, [download](https://www.image-net.org/download) and unzip the train(val) set into `./data/ImageNet1K/train(val)`. 4. Set parameters. - All parameters are kept in `default_args()` function of `main_mae(eval).py` file. 5. Run the code. ``` > python main_mae.py # train MAE encoder > python main_eval.py # evaluate MAE encoder ``` 6. Visualize the ouput. ``` > tensorboard --logdir=./log --port 8888 ``` ## Detail ### Project structure ``` ... + ckpt # checkpoint + data # data folder + img # store images for README.md + log # log files .gitignore lars.py # LARS optimizer main_eval.py # main file for evaluation main_mae.py # main file for MAE training model.py # model definitions of MAE and EvalNet README.md util.py # helper functions vit.py # definition of vision transformer ``` ### Encoder setting In the paper, *ViT-Base*, *ViT-Large* and *ViT-Huge* are used. You can switch between them by simply changing the parameters in `default_args()`. Details can be found [here](https://openreview.net/forum?id=YicbFdNTTy) and are listed in following table. | Name | Layer Num. | Hidden Size | MLP Size | Head Num. | |:-----:|:----------:|:-----------:|:-----------:|:---------:| | Arg | vit_depth | vit_dim | vit_mlp_dim | vit_heads | | ViT-B | 12 | 768 | 3072 | 12 | | ViT-L | 24 | 1024 | 4096 | 16 | | ViT-H | 32 | 1280 | 5120 | 16 | ### Evaluation setting I implement four network training strategies concerned in the paper, including - **pre-training** is used to train MAE encoder and done in `main_mae.py`. - **linear probing** is used to evaluate MAE encoder. During training, MAE encoder is fixed. + `args.n_partial = 0` - **partial fine-tuning** is used to evaluate MAE encoder. During training, MAE encoder is partially fixed. + `args.n_partial = 0.5` --> fine-tuning MLP sub-block with the transformer fixed + `1<=args.n_partial<=args.vit_depth-1` --> fine-tuning MLP sub-block and last layers of transformer - **end-to-end fine-tuning** is used to evaluate MAE encoder. During training, MAE encoder is fully trainable. + `args.n_partial = args.vit_depth` Note that the last three strategies are done in `main_eval.py` where parameter `args.n_partial` is located. At the same time, I follow the parameter settings in the paper appendix. Note that **partial fine-tuning** and **end-to-end fine-tuning** use the same setting. Nevertheless, I replace `RandAug(9, 0.5)` with `RandomResizedCrop` and leave `mixup`, `cutmix` and `drop path` techniques in further implementation. ## Result The experiment reproduce will takes a long time and I am unfortunately busy these days. If you get some results and are willing to contribute, please reach me via email. Thanks! By the way, **I have run the code from start to end.** **It works!** So don't worry about the implementation errors. If you find any, please raise issues or email me. ## Licence This repository is under [GPL V3](https://github.com/liujiyuan13/MAE-code/blob/main/LICENSE). ## About Thanks project [*vit-pytorch*](https://github.com/lucidrains/vit-pytorch), [*pytorch-lars*](https://github.com/JosephChenHub/pytorch-lars) and [*DeepLearningExamples*](https://github.com/NVIDIA/DeepLearningExamples) for their codes contribute to this repository a lot! Homepage: Email: