语义分割模型自动压缩示例

1.简介
2.Benchmark
3.开始自动压缩
4.预测部署
5.FAQ

1.简介

本示例将以语义分割模型PP-Liteseg为例，介绍如何使用PaddleSlim中的ACT压缩工具型进行自动压缩。本示例使用的自动压缩策略为量化蒸馏训练。

2.Benchmark

模型	策略	Total IoU (%)	CPU耗时(ms) thread=10 mkldnn=on	Nvidia GPU耗时(ms) TRT=on	配置文件	Inference模型
OCRNet_HRNetW48	Baseline	82.15	4332.2	154.9	-	mode
OCRNet_HRNetW48	量化蒸馏训练	82.03	3728.7	59.8	config	model
SegFormer-B0*	Baseline	75.27	285.4	34.3	-	model
SegFormer-B0*	量化蒸馏训练	75.22	284.1	35.7	config	model
PP-LiteSeg-Tiny	Baseline	77.04	640.72	11.9	-	model
PP-LiteSeg-Tiny	量化蒸馏训练	77.14	450.19	7.5	config	model
PP-MobileSeg-Base	Baseline	41.55	311.1	17.8	-	model
PP-MobileSeg-Base	量化蒸馏训练	39.08	303.6	16.2	config	model

SegFormer-B0 is tested on CPU under deleted gpu_cpu_map_matmul_v2_to_mul_pass because it will raise an error.
PP-MobileSeg-Base is tested on ADE20K dataset, while others are tested on cityscapes.

CPU测试环境：
- Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
- cpu thread: 10
Nvidia GPU测试环境：
- 硬件：NVIDIA Tesla V100 单卡
- 软件：CUDA 11.2, cudnn 8.1.0, TensorRT-8.0.3.4
- 测试配置：batch_size: 4
测速要求：
- 批量测试取平均：单张图片上测速时间会有浮动，因此测速需要跑10遍warmup，再跑100次取平均。现有test_seg的批量测试已经集成该功能。
- 确认TRT加速：检查下int8模型是否开启了trt int8模式，确认预测中有没有trt pass，比如看下有无这个pass：trt_delete_weight_dequant_linear_op_pass
- 确认是否开启了动态shape的功能？如果是，则需要跑两遍，第一次会在采集shape大小，需要以第二次的时间为准，

下面将以开源数据集为例介绍如何对PP-Liteseg进行自动压缩。

3. 自动压缩流程

3.1 准备环境

PaddlePaddle == 2.5 （可从Paddle官网下载安装）
PaddleSlim == 2.5
PaddleSeg == develop

安装paddlepaddle：

# CPU
python -m pip install paddlepaddle==2.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
# GPU 以Ubuntu、CUDA 10.2为例
python -m pip install paddlepaddle-gpu==2.5.1.post102 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

安装paddleslim 2.5：

pip install paddleslim@git+https://gitee.com/paddlepaddle/PaddleSlim.git@release/2.5

安装paddleseg develop和对应包：

cd ..
git clone https://github.com/PaddlePaddle/PaddleSeg.git -b develop
cd PaddleSeg/
python setup.py install

3.2 准备数据集

开发者可下载开源数据集 (如Cityscapes) 或参考PaddleSeg数据准备文档来自定义语义分割数据集。
本示例使用示例开源数据集 Cityscapes 数据集为例介绍如何对PP-Liteseg-Tiny进行自动压缩。示例数据集仅用于快速跑通自动压缩流程，并不能复现出 benckmark 表中的压缩效果。下载链接
准备好数据后，需要放入到deploy/slim/act/data/cityscapes目录下。

3.3 准备预测模型

通过下面的指令可以对ppliteseg-tiny的模型进行导出，其他的模型导出可以参照导出指南：

cd PaddleSeg/
wget https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k/model.pdparams

python tools/export.py --config configs/pp_liteseg/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k.yml --model_path model.pdparams  --save_dir ppliteseg_tiny_scale1.0_export

导出模型后，需要指定模型路径到配置文件中的 model_filename 和 params_filename。
预测模型的格式为：model.pdmodel 和 model.pdiparams两个，带pdmodel的是模型文件，带pdiparams后缀的是权重文件。

3.4 自动压缩并产出模型

自动压缩示例通过run_seg.py脚本启动，会使用接口 paddleslim.auto_compression.AutoCompression 对模型进行自动压缩。首先要配置config文件中模型路径、数据集路径、蒸馏、量化、稀疏化和训练等部分的参数，配置完成后便可对模型进行非结构化稀疏、蒸馏和量化、蒸馏。

自行配置量化参数进行量化蒸馏训练，配置参数含义详见自动压缩超参文档。具体命令如下所示：

# 单卡启动
export CUDA_VISIBLE_DEVICES=0
cd PaddleSeg/deploy/slim/act/
python run_seg.py \
      --act_config_path='./configs/ppliteseg/ppliteseg_qat.yaml' \
      --save_dir='./save_quant_model_qat'  \
      --config_path="configs/datasets/pp_liteseg_1.0_data.yml"

# 多卡启动
export CUDA_VISIBLE_DEVICES=0,1
cd PaddleSeg/deploy/slim/act/
python -m paddle.distributed.launch run_seg.py \
      --act_config_path='./configs/ppliteseg/ppliteseg_qat.yaml' \
      --save_dir='./save_quant_model_qat'  \
      --config_path="configs/datasets/pp_liteseg_1.0_data.yml"

压缩完成后会在save_dir中产出压缩好的预测模型，可直接预测部署。

4.预测部署

4.1 Paddle Inference 验证性能

输出的量化模型也是静态图模型，静态图模型在GPU上可以使用TensorRT进行加速，在CPU上可以使用MKLDNN进行加速。预测可以参考预测文档。

TensorRT预测环境配置：

如果使用 TesorRT 预测引擎，需安装 WITH_TRT=ON 的Paddle，上述paddle下载的2.5满足打开TensorRT编译的要求。
使用TensorRT预测需要进一步安装TensorRT，安装TensorRT的方式参考TensorRT安装说明。

以下字段用于配置预测参数：

参数名	含义
model_path	inference 模型文件所在目录，该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
model_filename	inference_model_dir文件夹下的模型文件名称
params_filename	inference_model_dir文件夹下的参数文件名称
dataset	选择数据集的类型，可选：`human`, `cityscapes`, `ade`。
dataset_config	数据集配置的config
image_file	待测试单张图片的路径，如果设置image_file，则dataset_config将无效。
device	预测时的设备，可选：`CPU`, `GPU`。
use_trt	是否使用 TesorRT 预测引擎，在device为`GPU`时生效。
use_mkldnn	是否启用`MKL-DNN`加速库，注意`use_mkldnn`，在device为`CPU`时生效。
cpu_threads	CPU预测时，使用CPU线程数量，默认10
precision	预测时精度，可选：`fp32`, `fp16`, `int8`。

准备好预测模型，并且修改dataset_config中数据集路径为正确的路径后，启动测试：

4.1.1 基于压缩模型进行基于GPU的批量测试：

cd PaddleSeg/deploy/slim/act/
python test_seg.py \
      --model_path=save_quant_model_qat \
      --dataset='cityscapes' \
      --config="configs/datasets/pp_liteseg_1.0_data.yml" \
      --precision=int8 \
      --use_trt=True

预期结果：

4.1.2 基于压缩前模型进行基于GPU的批量测试：

cd PaddleSeg/deploy/slim/act/
python test_seg.py \
      --model_path=ppliteseg_tiny_scale1.0_export/ \
      --dataset='cityscapes' \
      --config="configs/datasets/pp_liteseg_1.0_data.yml" \
      --precision=fp32 \
      --use_trt=True

预期结果：

4.1.3 基于压缩模型进行基于CPU的批量测试：

MKLDNN预测：

cd PaddleSeg/deploy/slim/act/
python test_seg.py \
      --model_path=save_quant_model_qat \
      --dataset='cityscapes' \
      --config="configs/datasets/pp_liteseg_1.0_data.yml" \
      --device=CPU \
      --use_mkldnn=True \
      --precision=int8 \
      --cpu_threads=10

4.2 Paddle Inference 测试单张图片

4.2.1 基于压缩前模型测试单张图片：

wget https://paddleseg.bj.bcebos.com/dygraph/demo/cityscapes_demo.png

cd PaddleSeg/deploy/slim/act/
python test_seg.py \
      --model_path=ppliteseg_tiny_scale1.0_export \
      --dataset='cityscapes' \
      --image_file=cityscapes_demo.png \
      --use_trt=True \
      --precision=fp32 \
      --save_file res_qat_fp32.png

预期结果：

4.2.2 基于压缩模型测试单张图片：

cd PaddleSeg/deploy/slim/act/

wget https://paddleseg.bj.bcebos.com/dygraph/demo/cityscapes_demo.png

python test_seg.py \
      --model_path=save_quant_model_qat \
      --dataset='cityscapes' \
      --image_file=cityscapes_demo.png \
      --use_trt=True \
      --precision=int8 \
      --save_file res_qat_int8.png

预期结果：

4.2.3 图片结果对比

原始图片
FP32推理结果
Int8推理结果

4.3 更多部署教程

5.FAQ

1. paddleslim 和 paddleseg 存在opencv的版本差异？

A：去除Paddleslim中requirements.txt的opencv版本限制后重新安装。

2. 报错：Distill_node_pair config wrong, the length need to be an even number ？

A：蒸馏配置中的node需要设置成网络的输出节点。

使用netron打开静态图模型model.pdmodel；
修改QAT配置中node为最后一层卷积的输出名字。

3. 量化蒸馏训练精度很低？

A：去除量化训练的输出结果，重新运行一次，这是由于网络训练到局部极值点导致。

4. TensorRT推理报错：TensorRT dynamic library not found.

A：参考TensorRT安装说明，查看是否有版本不匹配或者路径没有配置。

5. ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer':

A 需要安装paddleslim 2.5，其适配了paddle2.5

6. ValueError: The axis is expected to be in range of [0,0) but got:

A: 需要安装paddleseg devleop版本，如果确定已经安装，建议使用pip uninstall paddleseg卸载后重新安装。

7. NotImplementedError：delete weight dequant op pass is not supported for per channel quantization

A：参考https://github.com/PaddlePaddle/Paddle/issues/56619，并参考[TensorRT安装说明](../../../docs/deployment/installtrt.md)安装TensorRT。

8. CPU推理精度严重下降

A：CPU推理精度下降通常是由于推理过程中量化的op设置问题导致的，请确保推理过程中量化的op和训练过程中量化的op一致，才能保证推理精度和训练精度对齐。以本文的PP-Liteseg为例进行说明：

量化训练配置文件是configs/ppliteseg/ppliteseg_qat.yaml，其中量化的op是conv2d和depthwise_conv2d，因此在推理过程中也需要量化这两个op，可以通过使用如下函数进行设置：

# deploy/slim/act/test_seg.py:64
pred_cfg.enable_mkldnn_int8({
                    "conv2d", "depthwise_conv2d"
                })

而且最好只量化这两个op，如果增加其他op的量化，可能会导致精度下降。以下是一个简单的实验结果：

	原模型fp32推理	原模型fp32+mkldnn加速	量化模型int8推理（量化conv2d,depthwise_conv2d）	量化模型int8推理（量化conv2d,depthwise_conv2d,elementwise_mul）	量化模型int8推理（量化conv2d,depthwise_conv2d,elementwise_mul,pool2d）
mIoU	0.7704	0.7704	0.7658	0.7657	0.7372
耗时（ms）	1216.8	1191.3	434.5	439.6	505.8

PaddlePaddle/PaddleSeg

语义分割模型自动压缩示例

1.简介

2.Benchmark

3. 自动压缩流程

3.1 准备环境

3.2 准备数据集

3.3 准备预测模型

3.4 自动压缩并产出模型

4.预测部署

4.1 Paddle Inference 验证性能

4.1.1 基于压缩模型进行基于GPU的批量测试：

4.1.2 基于压缩前模型进行基于GPU的批量测试：

4.1.3 基于压缩模型进行基于CPU的批量测试：

4.2 Paddle Inference 测试单张图片

4.2.1 基于压缩前模型测试单张图片：

4.2.2 基于压缩模型测试单张图片：

4.2.3 图片结果对比

4.3 更多部署教程

5.FAQ

1. paddleslim 和 paddleseg 存在opencv的版本差异？

2. 报错：Distill_node_pair config wrong, the length need to be an even number ？

3. 量化蒸馏训练精度很低？

4. TensorRT推理报错：TensorRT dynamic library not found.

5. ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer':

6. ValueError: The axis is expected to be in range of [0,0) but got:

7. NotImplementedError：delete weight dequant op pass is not supported for per channel quantization

8. CPU推理精度严重下降

简介

发行版

PaddleSeg 开源评估指数

贡献者 (143)

语言

近期动态

PaddlePaddle/PaddleSeg .gitee-modal { width: 500px !important; }

语义分割模型自动压缩示例

1.简介

2.Benchmark

3. 自动压缩流程

3.1 准备环境

3.2 准备数据集

3.3 准备预测模型

3.4 自动压缩并产出模型

4.预测部署

4.1 Paddle Inference 验证性能

4.1.1 基于压缩模型进行基于GPU的批量测试：

4.1.2 基于压缩前模型进行基于GPU的批量测试：

4.1.3 基于压缩模型进行基于CPU的批量测试：

4.2 Paddle Inference 测试单张图片

4.2.1 基于压缩前模型测试单张图片：

4.2.2 基于压缩模型测试单张图片：

4.2.3 图片结果对比

4.3 更多部署教程

5.FAQ

1. paddleslim 和 paddleseg 存在opencv的版本差异？

2. 报错：Distill_node_pair config wrong, the length need to be an even number ？

3. 量化蒸馏训练精度很低？

4. TensorRT推理报错：TensorRT dynamic library not found.

5. ImportError: cannot import name 'MSRA' from 'paddle.fluid.initializer':

6. ValueError: The axis is expected to be in range of [0,0) but got:

7. NotImplementedError：delete weight dequant op pass is not supported for per channel quantization

8. CPU推理精度严重下降

简介

发行版

PaddleSeg 开源评估指数

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者 (143)

语言

近期动态

搜索帮助

PaddlePaddle/PaddleSeg