# PTQ4ViT

**Repository Path**: hellobran/PTQ4ViT

## Basic Information

- **Project Name**: PTQ4ViT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-06
- **Last Updated**: 2021-12-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PTQ4ViT
Post-Training Quantization Framework for Vision Transformers.
We use the twin uniform quantization method to reduce the quantization error on these activation values.
And we use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost.
The quantized vision transformers (ViT, DeiT, and Swin) achieve near-lossless prediction accuracy (less than 0.5\% drop at 8-bit quantization) on the ImageNet classification task. Please read the [paper](https://arxiv.org/abs/2111.12293) for details.


## Install

### Requirement 
- python>=3.5
- pytorch>=1.5
- matplotlib
- pandas
- timm

### Datasets
To run example testing, you should put your ImageNet2012 dataset in path `/datasets/imagenet`.

We use `ViTImageNetLoaderGenerator` in `utils/datasets.py` to initialize our DataLoader.
If your Imagenet datasets are stored elsewhere, you'll need to manually pass its root as an argument when instantiating a `ViTImageNetLoaderGenerator`.

## Usage

### 1. Run example quantization
To test on all models with BasePTQ/PTQ4ViT, run
```bash
python example/test_all.py
```

To run ablation testing, run
```bash
python example/test_ablation.py
```

You can run the testing scripts with multiple GPUs. For example, calling
```bash
python example/test_all.py --multigpu --n_gpu 6
```
will use 6 gpus to run the test.

### 2. Download quantized model checkpoints
(Coming soon)

## Results
### Results of BasePTQ

| model        | original | w8a8   | w6a6    |
|:------------:|:--------:|:------:|:-------:|
| ViT-S/224/32 | 75.99    | 73.61  | 60.144  |
| ViT-S/224    | 81.39    | 80.468 | 70.244  |
| ViT-B/224    | 84.54    | 83.896 | 75.668  |
| ViT-B/384    | 86.00    | 85.352 | 46.886  |
| DeiT-S/224   | 79.80    | 77.654 | 72.268  |
| DeiT-B/224   | 81.80    | 80.946 | 78.786  |
| DeiT-B/384   | 83.11    | 82.33  | 68.442  |
| Swin-T/224   | 81.39    | 80.962 | 78.456  |
| Swin-S/224   | 83.23    | 82.758 | 81.742  |
| Swin-B/224   | 85.27    | 84.792 | 83.354  |
| Swin-B/384   | 86.44    | 86.168 | 85.226  |

Results of PTQ4ViT

| model        | original | w8a8   | w6a6    |
|:------------:|:--------:|:------:|:-------:|
| ViT-S/224/32 | 75.99    | 75.582 | 71.908  |
| ViT-S/224    | 81.39    | 81.002 | 78.63   |
| ViT-B/224    | 84.54    | 84.25  | 81.65   |
| ViT-B/384    | 86.00    | 85.828 | 83.348  |
| DeiT-S/224   | 79.80    | 79.474 | 76.282  |
| DeiT-B/224   | 81.80    | 81.482 | 80.25   |
| DeiT-B/384   | 83.11    | 82.974 | 81.55   |
| Swin-T/224   | 81.39    | 81.246 | 80.47   |
| Swin-S/224   | 83.23    | 83.106 | 82.38   |
| Swin-B/224   | 85.27    | 85.146 | 84.012  |
| Swin-B/384   | 86.44    | 86.394 | 85.388  |

### Results of Ablation
- ViT-S/224 (original top-1 accuracy 81.39%)

| Hessian Guided | Softmax Twin | GELU Twin | W8A8   | W6A6    |
|:--------------:|:------------:|:---------:|:------:|:-------:|
|                |              |           | 80.47  | 70.24   |
| ✓              |              |           | 80.93  | 77.20   |
| ✓              | ✓            |           | 81.11  | 78.57   |
| ✓              |              | ✓         | 80.84  | 76.93   |
|                | ✓            | ✓         | 79.25  | 74.07   |
| ✓              | ✓            | ✓         | 81.00  | 78.63   |

- ViT-B/224 (original top-1 accuracy 84.54%)

| Hessian Guided | Softmax Twin | GELU Twin | W8A8   | W6A6    |
|:--------------:|:------------:|:---------:|:------:|:-------:|
|                |              |           | 83.90  | 75.67   |
| ✓              |              |           | 83.97  | 79.90   |
| ✓              | ✓            |           | 84.07  | 80.76   |
| ✓              |              | ✓         | 84.10  | 80.82   |
|                | ✓            | ✓         | 83.40  | 78.86   |
| ✓              | ✓            | ✓         | 84.25  | 81.65   |

- ViT-B/384 (original top-1 accuracy 86.00%)

| Hessian Guided | Softmax Twin | GELU Twin | W8A8   | W6A6    |
|:--------------:|:------------:|:---------:|:------:|:-------:|
|                |              |           | 85.35  | 46.89   |
| ✓              |              |           | 85.42  | 79.99   |
| ✓              | ✓            |           | 85.67  | 82.01   |
| ✓              |              | ✓         | 85.60  | 82.21   |
|                | ✓            | ✓         | 84.35  | 80.86   |
| ✓              | ✓            | ✓         | 85.89  | 83.19   |

## Citation
```
@article{PTQ4ViT_cvpr2022,
    title={PTQ4ViT: Post-Training Quantization Framework for Vision Transformers},
    author={Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun},
    journal={arXiv preprint arXiv:2111.12293},
    year={2022},
}
```