Translator: unseeme
Linux
Model Optimization
Expert
Post training quantization refers to perform weights quantization or full quantization on a pre-trained model. It can reduce model size while also speed up the inference. This process does not require training. Small amounts of calibration data is needed for activations quantization.
Quantify the weights of the model, only reduce the model size. Float32 operations are still performed during inference. The lower the number of quantization bits, the greater the model compression rate, but accuracy loss is usually become relatively large.
Quantify the weights and activations of the model, int operations are performed during inference. It can reduce the size of the model, increase the speed of model inference, and reduce power consumption. For scenarios that need to increase the running speed and reduce the power consumption of the model, you can use the post training full quantization. In order to calculate the quantitative parameters of the activations, the user needs to provide a calibration dataset.
Choose to use the corresponding post training quantization tool according to the hardware platform deployed for model inference.
Post Training Quantization Tools | Quantization Method Supported | Inference Hardware Platform Supported | Quantization Model Deployment |
---|---|---|---|
MindSpore Post Training Quantization Tools | Weights Quantization Full Quantization |
CPU | Inference on edge device |
Ascend Model Compression Tool | Full Quantization | Ascend 310 AI Processor | Inference on Ascend 310 AI Processor |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。