# mae
**Repository Path**: chenyang918/mae
## Basic Information
- **Project Name**: mae
- **Description**: PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2022-01-13
- **Last Updated**: 2024-10-15
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
## Masked Autoencoders: A PyTorch Implementation
This is a PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377):
```
@Article{MaskedAutoencoders2021,
author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}
```
* The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.
* This repo is a modification on the [DeiT repo](https://github.com/facebookresearch/deit). Installation and preparation follow that repo.
* This repo is based on [`timm==0.3.2`](https://github.com/rwightman/pytorch-image-models), for which a [fix](https://github.com/rwightman/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.
### Catalog
- [x] Visualization demo
- [x] Pre-trained checkpoints + fine-tuning code
- [x] Pre-training code
### Visualization demo
Run our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):
### Fine-tuning with pre-trained checkpoints
The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:
The fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).
By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):
|
ViT-B |
ViT-L |
ViT-H |
ViT-H448 |
prev best |
ImageNet-1K (no external data) |
83.6 |
85.9 |
86.9 |
87.8 |
87.1 |
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K): |
ImageNet-Corruption (error rate) |
51.7 |
41.8 |
33.8 |
36.8 |
42.5 |
ImageNet-Adversarial |
35.9 |
57.1 |
68.2 |
76.7 |
35.8 |
ImageNet-Rendition |
48.3 |
59.9 |
64.4 |
66.5 |
48.7 |
ImageNet-Sketch |
34.5 |
45.3 |
49.6 |
50.9 |
36.0 |
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset: |
iNaturalists 2017 |
70.5 |
75.7 |
79.3 |
83.4 |
75.4 |
iNaturalists 2018 |
75.4 |
80.1 |
83.0 |
86.8 |
81.2 |
iNaturalists 2019 |
80.5 |
83.4 |
85.7 |
88.3 |
84.1 |
Places205 |
63.9 |
65.8 |
65.9 |
66.8 |
66.0 |
Places365 |
57.9 |
59.4 |
59.8 |
60.3 |
58.0 |
### Pre-training
The pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).
### License
This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.