# MindSpore Vision Transformer

**Repository Path**: tianyu__zhou/mindspore-vision-transformer

## Basic Information

- **Project Name**: MindSpore Vision Transformer
- **Description**: No description available
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 5
- **Forks**: 0
- **Created**: 2021-11-23
- **Last Updated**: 2022-07-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# MindSpore Vision Transformer

#### Model Architecture
![输入图片说明](pics/vit_arch.PNG)  
 **AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE**  
https://arxiv.org/pdf/2010.11929.pdf

#### Environment
MindSpore 1.3  
https://www.mindspore.cn/tutorials/en/r1.3/index.html

Training image (version 21.0.2-{ubuntu18.04, centos7.6})  
https://ascendhub.huawei.com/#/detail/ascend-mindspore
#### Benchmark

| Dataset  |  No. NPU | Batch Size  | Embedding Size  | No. Encoders  |  No. Params | Training FPS  |
|---|---|---|---|---|---|---|
|  Cifar10 |1   |128   | 128  |  2 |206858   | 139  |
|  Cifar10 |1   |128   | 768|  12 |42605578| 9 |

#### Progress
- Model architecture is done
- Training data pre-processing
- Loss and optimizer definition
- Single-NPU training
#### TODO
- Learning rate scheduler
- Multi-NPU training