Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool) is an open-source Python library running on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies, such as quantization, pruning, knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It also implements different weight pruning algorithms to generate pruned model with predefined sparsity goal and supports knowledge distillation to distill the knowledge from the teacher model to the student model.
Note
GPU support is under development.
Visit the Intel® Neural Compressor online document website at: https://intel.github.io/neural-compressor.
Intel® Neural Compressor features an infrastructure and workflow that aids in increasing performance and faster deployments across architectures.
Click the image to enlarge it.
Click the image to enlarge it.
Supported deep learning frameworks are:
Note: Intel Optimized TensorFlow 2.5.0 requires to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running Neural Compressor quantization or deploying the quantized model.
Note: From the official TensorFlow 2.6.0, oneDNN support has been upstreamed. Download the official TensorFlow 2.6.0 binary for the CPU device and set the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running the quantization process or deploying the quantized model.
Select the installation based on your operating system.
You can install Neural Compressor using one of three options: Install just the library from binary or source, or get the Intel-optimized framework together with the library by installing the Intel® oneAPI AI Analytics Toolkit.
Prerequisites
The following prerequisites and requirements must be satisfied for a successful installation:
Python version: 3.6 or 3.7 or 3.8 or 3.9
C++ compiler: 7.2.1 or above
CMake: 3.12 or above
common build issues
Issue 1
: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObjectSolution: reinstall pycocotools by "pip install pycocotools --no-cache-dir"
Issue 2
: ImportError: libGL.so.1: cannot open shared object file: No such file or directorySolution: apt install or yum install opencv
# install stable version from pip
pip install neural-compressor
# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor
# install stable version from from conda
conda install neural-compressor -c conda-forge -c intel
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install
The Intel® Neural Compressor library is released as part of the Intel® oneAPI AI Analytics Toolkit (AI Kit). The AI Kit provides a consolidated package of Intel's latest deep learning and machine optimizations all in one place for ease of development. Along with Neural Compressor, the AI Kit includes Intel-optimized versions of deep learning frameworks (such as TensorFlow and PyTorch) and high-performing Python libraries to streamline end-to-end data science and AI workflows on Intel architectures.
The AI Kit is distributed through many common channels, including from Intel's website, YUM, APT, Anaconda, and more. Select and download the AI Kit distribution package that's best suited for you and follow the Get Started Guide for post-installation instructions.
Download AI Kit | AI Kit Get Started Guide |
---|
Prerequisites
The following prerequisites and requirements must be satisfied for a successful installation:
Python version: 3.6 or 3.7 or 3.8 or 3.9
Download and install anaconda.
Create a virtual environment named nc in anaconda:
# Here we install python 3.7 for instance. You can also choose python 3.6, 3.8, or 3.9.
conda create -n nc python=3.7
conda activate nc
Installation options
# install stable version from pip
pip install neural-compressor
# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor
# install from conda
conda install neural-compressor -c conda-forge -c intel
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
python setup.py install
Get Started
Deep Dive
Advanced Topics
Publications
Full publication list please refers to here
Intel® Neural Compressor supports systems based on Intel 64 architecture or compatible processors, specially optimized for the following CPUs:
Intel® Neural Compressor requires installing the Intel-optimized framework version for the supported DL framework you use: TensorFlow, PyTorch, MXNet, or ONNX runtime.
Note: Intel Neural Compressor supports Intel-optimized and official frameworks for some TensorFlow versions. Refer to Supported Frameworks for specifics.
Platform | OS | Python | Framework | Version |
---|---|---|---|---|
Cascade Lake Cooper Lake Skylake Ice Lake |
CentOS 8.3 Ubuntu 18.04 |
3.6 3.7 3.8 3.9 |
TensorFlow | 2.7.0 |
2.6.2 | ||||
2.5.0 | ||||
1.15.0UP3 | ||||
PyTorch | 1.10.0+cpu | |||
1.9.0+cpu | ||||
1.8.0+cpu | ||||
IPEX | ||||
MXNet | 1.8.0 | |||
1.7.0 | ||||
1.6.0 | ||||
ONNX Runtime | 1.9.0 | |||
1.8.0 | ||||
1.7.0 |
Intel® Neural Compressor provides numerous examples to show promising accuracy loss with the best performance gain. A full quantized model list on various frameworks is available in the Model List.
Model | Framework | Support | Example |
---|---|---|---|
ResNet50 v1.5 | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
DLRM | PyTorch | Yes | Link |
BERT-large | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
SSD-ResNet34 | TensorFlow | Yes | Link |
PyTorch | Yes | Link | |
RNN-T | PyTorch | Yes | Link |
3D-UNet | TensorFlow | WIP | |
PyTorch | Yes | Link |
Framework | version | model | Accuracy | Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec) | ||||
---|---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | |||
tensorflow | 2.6.0 | resnet50v1.0 | 74.11% | 74.27% | -0.22% | 1287.00 | 495.29 | 2.60x |
tensorflow | 2.6.0 | resnet50v1.5 | 76.82% | 76.46% | 0.47% | 1218.03 | 420.34 | 2.90x |
tensorflow | 2.6.0 | resnet101 | 77.50% | 76.45% | 1.37% | 849.62 | 345.54 | 2.46x |
tensorflow | 2.6.0 | inception_v1 | 70.48% | 69.74% | 1.06% | 2202.64 | 1058.20 | 2.08x |
tensorflow | 2.6.0 | inception_v2 | 74.36% | 73.97% | 0.53% | 1751.31 | 827.81 | 2.11x |
tensorflow | 2.6.0 | inception_v3 | 77.28% | 76.75% | 0.69% | 868.06 | 384.17 | 2.26x |
tensorflow | 2.6.0 | inception_v4 | 80.40% | 80.27% | 0.16% | 569.48 | 197.28 | 2.89x |
tensorflow | 2.6.0 | inception_resnet_v2 | 80.44% | 80.40% | 0.05% | 269.03 | 137.25 | 1.96x |
tensorflow | 2.6.0 | mobilenetv1 | 71.79% | 70.96% | 1.17% | 3831.42 | 1189.06 | 3.22x |
tensorflow | 2.6.0 | mobilenetv2 | 71.79% | 71.76% | 0.04% | 2570.69 | 1237.62 | 2.07x |
tensorflow | 2.6.0 | ssd_resnet50_v1 | 37.86% | 38.00% | -0.37% | 65.52 | 24.01 | 2.73x |
tensorflow | 2.6.0 | ssd_mobilenet_v1 | 22.97% | 23.13% | -0.69% | 842.46 | 404.04 | 2.08x |
tensorflow | 2.6.0 | ssd_resnet34 | 21.69% | 22.09% | -1.81% | 41.23 | 10.75 | 3.83x |
Framework | version | model | Accuracy | Performance/ICX8380/1s4c10ins1bs/throughput(samples/sec) | ||||
---|---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | |||
pytorch | 1.9.0+cpu | resnet18 | 69.59% | 69.76% | -0.24% | 692.04 | 363.64 | 1.90x |
pytorch | 1.9.0+cpu | resnet50 | 76.00% | 76.13% | -0.17% | 453.10 | 186.67 | 2.43x |
pytorch | 1.9.0+cpu | resnext101_32x8d | 79.02% | 79.31% | -0.36% | 196.27 | 70.08 | 2.80x |
pytorch | 1.9.0+cpu | bert_base_mrpc | 88.12% | 88.73% | -0.69% | 199.32 | 107.34 | 1.86x |
pytorch | 1.9.0+cpu | bert_base_cola | 59.06% | 58.84% | 0.37% | 198.53 | 105.29 | 1.89x |
pytorch | 1.9.0+cpu | bert_base_sts-b | 88.72% | 89.27% | -0.62% | 203.29 | 107.03 | 1.90x |
pytorch | 1.9.0+cpu | bert_base_sst-2 | 91.74% | 91.86% | -0.13% | 197.86 | 105.31 | 1.88x |
pytorch | 1.9.0+cpu | bert_base_rte | 70.40% | 69.68% | 1.04% | 192.90 | 107.25 | 1.80x |
pytorch | 1.9.0+cpu | bert_large_mrpc | 87.66% | 88.33% | -0.75% | 94.08 | 33.84 | 2.78x |
pytorch | 1.9.0+cpu | bert_large_squad | 92.69 | 93.05 | -0.38% | 20.93 | 11.18 | 1.87x |
pytorch | 1.9.0+cpu | bert_large_qnli | 91.12% | 91.82% | -0.76% | 93.75 | 33.73 | 2.78x |
pytorch | 1.9.0+cpu | bert_large_rte | 72.20% | 72.56% | -0.50% | 52.80 | 33.62 | 1.57x |
pytorch | 1.9.0+cpu | bert_large_cola | 62.07% | 62.57% | -0.80% | 94.97 | 33.77 | 2.81x |
pytorch | 1.9.0+cpu | inception_v3 | 69.48% | 69.54% | -0.09% | 418.59 | 207.77 | 2.01x |
pytorch | 1.9.0+cpu | peleenet | 71.61% | 72.08% | -0.66% | 461.47 | 359.58 | 1.28x |
pytorch | 1.9.0+cpu | yolo_v3 | 24.50% | 24.54% | -0.17% | 98.11 | 37.50 | 2.62x |
Tasks | FWK | Model | fp32 baseline | gradient sensitivity with 20% sparsity | +onnx dynamic quantization on pruned model | ||||
---|---|---|---|---|---|---|---|---|---|
accuracy% | drop% | perf gain (sample/s) | accuracy% | drop% | perf gain (sample/s) | ||||
SST-2 | pytorch | bert-base | accuracy = 92.32 | accuracy = 91.97 | -0.38 | 1.30x | accuracy = 92.20 | -0.13 | 1.86x |
QQP | pytorch | bert-base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [89.97, 86.54] | [-1.24, -1.71] | 1.32x | [accuracy, f1] = [89.75, 86.60] | [-1.48, -1.65] | 1.81x |
Tasks | FWK | Model | fp32 baseline | Pattern Lock on 70% Unstructured Sparsity | Pattern Lock on 50% 1:2 Structured Sparsity | ||
---|---|---|---|---|---|---|---|
accuracy% | drop% | accuracy% | drop% | ||||
MNLI | pytorch | bert-base | [m, mm] = [84.57, 84.79] | [m, mm] = [82.45, 83.27] | [-2.51, -1.80] | [m, mm] = [83.20, 84.11] | [-1.62, -0.80] |
SST-2 | pytorch | bert-base | accuracy = 92.32 | accuracy = 91.51 | -0.88 | accuracy = 92.20 | -0.13 |
QQP | pytorch | bert-base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [90.48, 87.06] | [-0.68, -1.12] | [accuracy, f1] = [90.92, 87.78] | [-0.20, -0.31] |
QNLI | pytorch | bert-base | accuracy = 91.54 | accuracy = 90.39 | -1.26 | accuracy = 90.87 | -0.73 |
QnA | pytorch | bert-base | [em, f1] = [79.34, 87.10] | [em, f1] = [77.27, 85.75] | [-2.61, -1.54] | [em, f1] = [78.03, 86.50] | [-1.65, -0.69] |
Framework | Model | fp32 baseline | Compression | dataset | acc(drop)% |
---|---|---|---|---|---|
Pytorch | resnet18 | 69.76 | 30% sparsity on magnitude | ImageNet | 69.47(-0.42) |
Pytorch | resnet18 | 69.76 | 30% sparsity on gradient sensitivity | ImageNet | 68.85(-1.30) |
Pytorch | resnet50 | 76.13 | 30% sparsity on magnitude | ImageNet | 76.11(-0.03) |
Pytorch | resnet50 | 76.13 | 30% sparsity on magnitude and post training quantization | ImageNet | 76.01(-0.16) |
Pytorch | resnet50 | 76.13 | 30% sparsity on magnitude and quantization aware training | ImageNet | 75.90(-0.30) |
Example Name | Dataset | Student (Accuracy) |
Teacher (Accuracy) |
Student With Distillation (Accuracy Improvement) |
---|---|---|---|---|
ResNet example | ImageNet | ResNet18 (0.6739) |
ResNet50 (0.7399) |
0.6845 (0.0106) |
BlendCnn example | MRPC | BlendCnn (0.7034) |
BERT-Base (0.8382) |
0.7034 (0) |
BiLSTM example | SST-2 | BiLSTM (0.7913) |
RoBERTa-Base (0.9404) |
0.8085 (0.0172) |
model | Accuracy | Performance/ICX8380/1s4c10ins1bs/seq_len128/throughput(samples/sec) | Performance/ICX8380/2s4c20ins64bs/seq_len128/throughput(samples/sec) | ||||||
---|---|---|---|---|---|---|---|---|---|
INT8 | FP32 | Acc Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Preformance Ratio[INT8/FP32] | INT8 | FP32 | Preformance Ratio[INT8/FP32] | |
bert_large_squad | 90.74 | 90.87 | -0.14% | 44.9 | 12.33 | 3.64x | 362.21 | 88.38 | 4.10x |
distilbert_base_uncased_sst2 | 90.14% | 90.25% | -0.12% | 1003.01 | 283.69 | 3.54x | 2104.26 | 606.58 | 3.47x |
minilm_l6_h384_uncased_sst2 | 89.33% | 90.14% | -0.90% | 2739.73 | 999 | 2.74x | 5389.98 | 2333.14 | 2.31x |
roberta_base_mrpc | 89.46% | 88.97% | 0.55% | 506.07 | 142.13 | 3.56x | 1167.09 | 311.5 | 3.75x |
bert_base_nli_mean_tokens_stsb | 89.27% | 89.55% | -0.31% | 503.52 | 140.98 | 3.57x | 1096.46 | 332.54 | 3.30x |
bert_base_sparse_mrpc | 70.34% | 70.59% | -0.35% | 506.59 | 142.33 | 3.56x | 1133.04 | 339.96 | 3.33x |
distilroberta_base_wnli | 56.34% | 56.34% | 0.00% | 1026.69 | 290.7 | 3.53x | 2309.9 | 620.81 | 3.72x |
paraphrase_xlm_r_multilingual_v1_stsb | 86.72% | 87.23% | -0.58% | 509.68 | 142.73 | 3.57x | 1169.45 | 311.59 | 3.75x |
distilbert_base_uncased_mrpc | 84.07% | 84.07% | 0.00% | 1002 | 280.27 | 3.58x | 2107.96 | 606.95 | 3.47x |
finbert_financial_phrasebank | 82.74% | 82.80% | -0.07% | 919.12 | 272.48 | 3.37x | 1101.13 | 331.88 | 3.32x |
distilbert_base_uncased_emotion | 93.85% | 94.20% | -0.37% | 1003.01 | 283.53 | 3.54x | 2103.22 | 607.08 | 3.46x |
We are hiring. Please send your resume to inc.maintainers@intel.com if you have interests in model compression techniques.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。