amct量化yolov5模型推理速度提升不明显

一、问题现象（附报错日志上下文）：
对于 YOLOv5m 模型可以 AMCT 进行 int8 量化。
对于采用 AMCT（ONNX）模型压缩、不采用模型压缩的 YOLOv5m 模型，推理速度提升不明显，在 4x3x640x640 的输入下，仅有约 9% 左右的推理速度提升，而 Atlas 200 A2 DK 的具有 11TOPS FP16 算力和 22TOPS INT8 算力，经过 int8 量化后的模型不应该性能提升这么小。请问如何提升 yolov5 模型的 int8 量化的性能？

采用 amct_onnx int8 量化的模型的推理日志

```bash
[INFO] acl init success
[INFO] open device 0 success
[INFO] load model ./yolov5m6_640x640_sim_deploy_model.om success
[INFO] create model description success
[INFO] try get model batchsize:4
[INFO] warm up 1 done
Inference array Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.35s/it]
[INFO] -----------------Performance Summary------------------
[INFO] NPU_compute_time (ms): min = 55.69599914550781, max = 55.69599914550781, mean = 55.69599914550781, median = 55.69599914550781, percentile(99%) = 55.69599914550781
[INFO] throughput 1000*batchsize.mean(4)/NPU_compute_time.mean(55.69599914550781): 71.8184440780002
[INFO] ------------------------------------------------------
[INFO] unload model success, model Id is 1
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl
```

不采用 amct_onnx int8 量化的模型的推理日志

```bash
[INFO] acl init success
[INFO] open device 0 success
[INFO] load model ./yolov5m6_640x640_sim.om success
[INFO] create model description success
[INFO] try get model batchsize:4
[INFO] warm up 1 done
Inference array Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.31s/it]
[INFO] -----------------Performance Summary------------------
[INFO] NPU_compute_time (ms): min = 61.2599983215332, max = 61.2599983215332, mean = 61.2599983215332, median = 61.2599983215332, percentile(99%) = 61.2599983215332
[INFO] throughput 1000*batchsize.mean(4)/NPU_compute_time.mean(61.2599983215332): 65.29546375442814
[INFO] ------------------------------------------------------
[INFO] unload model success, model Id is 1
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl
```

相关模型和性能测试文件见附件

通过查看 profiler 性能测试数据，发现类似于 `Conv_3Conv_3.dequantSigmoid_4Mul_5Conv_6.quant` 的融合算子耗时最长，顺便提问一下为什么要在 Conv 计算后和 sigmoid 计算前插入 dequant 算子？

二、软件版本:
-- CANN 版本: 6.3.RC1.alpha001
-- 硬件信息：Atlas 200 A2 DK

三、测试步骤：

模型输入信息：4x3x640x640

模型转换指令：

```bash
atc --model=./yolov5m6_640x640_sim.onnx --output=./yolov5m6_640x640_sim \
--input_shape="images:4,3,640,640" \
--framework=5 --input_format=NCHW --soc_version=Ascend310 --log=error \
--output_type=FP16
```

四、附件：

amct_onnx 量化前后的 onnx 文件, om 文件, profiler 性能测试数据见链接：

链接：https://pan.baidu.com/s/1rP4uYaM2G22GNQnpUkBCvA?pwd=azlg
提取码：azlg

Ascend/modelzoo
暂停

内容风险标识

评论 (1)

Ascend/modelzoo暂停 .gitee-modal { width: 500px !important; }

内容风险标识