335 Star 1.5K Fork 861

MindSpore / docs

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
apply_post_training_quantization.md 2.21 KB
一键复制 编辑 原始数据 按行查看 历史
宦晓玲 提交于 2023-07-21 15:13 . modify the md links in 1.3

Applying Post Training Quantization

Translator: unseeme

Linux Model Optimization Expert

View Source On Gitee

Concept

Post training quantization refers to perform weights quantization or full quantization on a pre-trained model. It can reduce model size while also speed up the inference. This process does not require training. Small amounts of calibration data is needed for activations quantization.

Weights Quantization

Quantify the weights of the model, only reduce the model size. Float32 operations are still performed during inference. The lower the number of quantization bits, the greater the model compression rate, but accuracy loss is usually become relatively large.

Full Quantization

Quantify the weights and activations of the model, int operations are performed during inference. It can reduce the size of the model, increase the speed of model inference, and reduce power consumption. For scenarios that need to increase the running speed and reduce the power consumption of the model, you can use the post training full quantization. In order to calculate the quantitative parameters of the activations, the user needs to provide a calibration dataset.

Post Training Quantization Tools

Choose to use the corresponding post training quantization tool according to the hardware platform deployed for model inference.

Post Training Quantization Tools Quantization Method Supported Inference Hardware Platform Supported Quantization Model Deployment
MindSpore Post Training Quantization Tools Weights Quantization
Full Quantization
CPU Inference on edge device
Ascend Model Compression Tool Full Quantization Ascend 310 AI Processor Inference on Ascend 310 AI Processor
1
https://gitee.com/mindspore/docs.git
git@gitee.com:mindspore/docs.git
mindspore
docs
docs
r1.3

搜索帮助