# add_ops **Repository Path**: neoming/add_ops ## Basic Information - **Project Name**: add_ops - **Description**: 华为CANN单算子调用 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-12-25 - **Last Updated**: 2022-02-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Neoming CANN训练营第四期第三课大作业提交 华为云账号:neoming 联系方式: liyiming1998@qq.com 18918028073 本次作业使用的是`Add`算子 算子配置文件如下 ```json [ { "op": "Add", "input_desc": [ { "format": "NHWC", "shape": [2, 4, 1024,1024], "type": "float16" }, { "format": "NHWC", "shape": [2, 4, 1024,1024], "type": "float16" } ], "output_desc": [ { "format": "NHWC", "shape": [2, 4, 1024,1024], "type": "float16" } ] } ] ``` 共享镜像:ecs-course ## 提交文件夹说明 ```bash . ├── build # 代码构建目录 ├── data # 数据文件,输入数据通过 scripts目录下的 generate_data_bin.py生成 │ ├── cae_output_tensor.bin # aclopCompileAndExecute 产生的输出 │ ├── ev2_output_tensor.bin # aclopExecuteV2 产生的输出 │ ├── input_x1_tensor.bin # 输入数据 x1 │ └── input_x2_tensor.bin # 输入数据 x2 ├── inc # include文件,参考课程提供的工程 │ └── utils.h # 一些助手函数,参考课程提供工程 ├── model # atc 命令转换但算子的输出目录 │ ├── 0_Add_1_1_2_4_1024_1024_1_1_2_4_1024_1024_1_1_2_4_1024_1024.om # 转换得到的om模型(atc转换得到) │ ├── add_op.json # 算子配置文件(自定义) │ ├── fusion_result.json # 融合结果(atc转换得到) │ └── kernel_meta # kernel_meta(atc转换得到) ├── out # 编译的main二进制 │ ├── JOBADBBEECDHCGCEBCHHJCAAAAAAAAAA # main执行的log │ ├── JOBEBHFEDEHACABIFEDCFGAAAAAAAAAA # main执行的log │ └── main # 编译好的main ├── README.md # 本文件 ├── scripts # 脚本文件 │ ├── 1_get_om.sh # 使用atc命令转换Add算子,得到om │ ├── 2_cmake_build.sh # 创建build目录,进行编译,得到main │ ├── 3_inference.sh # 调用 generate_data_bin.py生成数据,并且跑aclopExecuteV2和aclopCompileAndExecute的算子调用 │ ├── 4_eval.sh # 调用eval.py验证aclopCompileAndExecute和aclopExecuteV2输出结果的正确性。使用tf.Add算子 │ ├── 5_profiling.sh # 使用profiling工具 │ ├── 6_run_all_scripts.sh # 跑 1-5 全部的脚本 │ ├── eval.py # 用tf.Add算子运算,aclopCompileAndExecute和aclopExecuteV2输出结果的正确性。并且输出1000次运算的时间 │ └── generate_data_bin.py # 生成输入数据bin文件 └── src # C++ 源代码 ├── acl.json # 打开profiler ├── CMakeLists.txt # CMakeLists.txt工程文件 ├── main.cpp # aclopCompileAndExecute和aclopExecuteV2 调用Add算子 └── utils.cpp # 工具具体实现,参考样例工程 ``` ## 完成的点 1. 在昇腾算子库中选一个算子实现单算子调用(om方式),并且功能正常,结果正确 2. 与原始框架算子做了精度对比分析 3. 与原始框架算子做了性能对比分析 4. 单算子调用用aclcompileandexecute方式实现并且功能正常,结果正确 ## 如何运行 ### 1. `1_get_om.sh`编译得到om模型 ```bash cd scripts ./1_get_om.sh ``` 终端输出 ```bash root@ecs-f400:~/add_ops/scripts# ./1_get_om.sh ATC start working now, please wait for a moment. ATC run success, welcome to the next use. ``` ### 2. `2_cmake_build`编译`src`中的`Add`算子调用代码 ```bash cd scripts ./2_cmake_build.sh ``` 终端输出 ```bash root@ecs-f400:~/add_ops/scripts# ./2_cmake_build.sh -- env INC_PATH: /usr/local/Ascend/ascend-toolkit/latest -- env LIB_PATH: /usr/local/Ascend/ascend-toolkit/latest/acllib/lib64/stub -- Configuring done -- Generating done -- Build files have been written to: /root/add_ops/build [100%] Built target main root@ecs-f400:~/add_ops/scripts# ./2_cmake_build.sh -- The C compiler identification is GNU 7.5.0 -- The CXX compiler identification is GNU 7.5.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- env INC_PATH: /usr/local/Ascend/ascend-toolkit/latest -- env LIB_PATH: /usr/local/Ascend/ascend-toolkit/latest/acllib/lib64/stub -- Configuring done -- Generating done -- Build files have been written to: /root/add_ops/build Scanning dependencies of target main [ 33%] Building CXX object CMakeFiles/main.dir/utils.cpp.o [ 66%] Building CXX object CMakeFiles/main.dir/main.cpp.o [100%] Linking CXX executable /root/add_ops/out/main [100%] Built target main ``` ### 3. `3_inference.sh`脚本执行算子推理 `src/main.cpp`支持四个输入参数: + input_x1_file: string 输入文件1路径 + input_x2_file: string 输入文件2路径 + output_file: string 输出文件路径 + use_aclopCompileAndExecute: bool 是否采用CompileAndExecute,false则采用ExecuteV2 `3_inference.sh`首先会调用`generate_data_bin.py`生成测试数据,数据放在`data`目录下,然后会执行两次推理,第一次用ExecuteV2,第二次用CompileAndExecute。相同的数据在`src/main.cpp`中会循环执行1000次,会用`clock`计算执行时间 ```bash cd scripts ./3_inference.sh ``` 终端输出,可以看到,ExecuteV2的执行时间为1.4434689999999999秒,CompileAndExecute的执行时间为4.1163400000000001秒 ```bash root@ecs-f400:~/add_ops/scripts# ./3_inference.sh [INFO] generate input_x1_tensor.bin and input_x2_tensor.bin in ../data dir [INFO] inference using aclopExecuteV2 [INFO] argv[0]: ../out/main [INFO] argv[1]: ../data/input_x1_tensor.bin [INFO] argv[2]: ../data/input_x2_tensor.bin [INFO] argv[3]: ../data/ev2_output_tensor.bin [INFO] argv[4]: false [INFO] using aclopExecuteV2 [INFO] acl init success [INFO] acl SetDevice success [INFO] acl CreateContext success [INFO] acl CreateStream success [INFO] acl SetModelDir success [INFO] acl time: 1.4434689999999999 [INFO] execute Add success [INFO] end to destroy stream [INFO] end to destroy context [INFO] end to reset device [INFO] end to finalize acl *********************************** [INFO] inference using aclopCompileAndExecute [INFO] argv[0]: ../out/main [INFO] argv[1]: ../data/input_x1_tensor.bin [INFO] argv[2]: ../data/input_x2_tensor.bin [INFO] argv[3]: ../data/cae_output_tensor.bin [INFO] argv[4]: true [INFO] using aclopCompileAndExecute [INFO] acl init success [INFO] acl SetDevice success [INFO] acl CreateContext success [INFO] acl CreateStream success [INFO] acl time: 4.1163400000000001 [INFO] execute Add success [INFO] end to destroy stream [INFO] end to destroy context [INFO] end to reset device [INFO] end to finalize acl ``` ### 4. `4_eval.sh`脚本对推理结果进行验证,并计算tensorflow中add算子的耗时 ```bash cd scripts ./4_eval.sh ``` 终端输出,可以看到结果正确,tf的执行时间为38.84127140045166秒 ```bash root@ecs-f400:~/add_ops/scripts# ./4_eval.sh WARNING:tensorflow:From eval.py:21: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. ... ... [INFO] tensorflow time: 38.84127140045166 [INFO] aclopExecuteV2 result correct! [INFO] aclopCompileAndExecute result correct! [INFO] eval finisned! ``` ### 5. `5_profiling.sh`脚本使用性能调优工具,导出性能数据 ```bash cd scripts ./5_profiling.sh ``` 终端输出 ```bash Sat 25 Dec 2021 14:54:03 [INFO] [MSVP] [4249] msprof_common.py: Start analyzing data in "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA" ... Sat 25 Dec 2021 14:54:03 [INFO] [MSVP] [4249] msprof_common.py: It may take few minutes, please be patient ... Sat 25 Dec 2021 14:54:04 [INFO] [MSVP] [4249] msprof_common.py: Analysis data in "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA" finished. Sat 25 Dec 2021 14:54:04 [WARNING] [MSVP] [4249] msprof_import.py: Invalid parsing dir("/root/add_ops/out/JOBDIBJBBFDAEGAFGEJCCBAAAAAAAAAA"), dir must be profiling data dir or it's parent directory Sat 25 Dec 2021 14:54:04 [INFO] [MSVP] [4249] msprof_common.py: Start analyzing data in "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA" ... Sat 25 Dec 2021 14:54:04 [INFO] [MSVP] [4249] msprof_common.py: It may take few minutes, please be patient ... Sat 25 Dec 2021 14:54:04 [INFO] [MSVP] [4249] msprof_common.py: Analysis data in "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA" finished. Sat 25 Dec 2021 14:54:04 [INFO] [MSVP] [4314] msprof_export.py: The data in "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA" has been analyzed. Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export task_time summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The task_time summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/task_time_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export op_summary summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The op_summary summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/op_summary_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export op_statistic summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The op_statistic summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/op_statistic_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export acl summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The acl summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/acl_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export acl_statistic summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The acl_statistic summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/acl_statistic_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export runtime_api summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The runtime_api summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/runtime_api_0_1.csv". Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: Start to export ai_stack_time summary data ... Sat 25 Dec 2021 14:54:05 [INFO] [MSVP] [4314] msprof_export.py: The ai_stack_time summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBBIDEDAHADFEIEEECDFDAAAAAAAAAA/summary/ai_stack_time_0_1.csv". Performance Summary Report: 1. Model/Operator Computation: N/A 2. Model/Operator Memory: N/A 3. Operator Schedule: 1)Task wait time has reached the upper limit: [Add] 4. Operator Processing: N/A 5. Operator Metrics: N/A Sat 25 Dec 2021 14:54:06 [WARNING] [MSVP] [4314] msprof_export.py: Invalid parsing dir("/root/add_ops/out/JOBDIBJBBFDAEGAFGEJCCBAAAAAAAAAA"), dir must be profiling data dir or it's parent directory Sat 25 Dec 2021 14:54:06 [INFO] [MSVP] [4314] msprof_export.py: The data in "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA" has been analyzed. Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export task_time summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The task_time summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/task_time_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export op_summary summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The op_summary summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/op_summary_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export op_statistic summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The op_statistic summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/op_statistic_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export acl summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The acl summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/acl_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export acl_statistic summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The acl_statistic summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/acl_statistic_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export runtime_api summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The runtime_api summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/runtime_api_0_1.csv". Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: Start to export ai_stack_time summary data ... Sat 25 Dec 2021 14:54:07 [INFO] [MSVP] [4314] msprof_export.py: The ai_stack_time summary data of device 0 for iteration 1 has been exported to "/root/add_ops/out/JOBHIGHDBBBABJCFEHCHABAAAAAAAAAA/summary/ai_stack_time_0_1.csv". Performance Summary Report: 1. Model/Operator Computation: N/A 2. Model/Operator Memory: N/A 3. Operator Schedule: 1)Task wait time has reached the upper limit: [Add] 4. Operator Processing: N/A 5. Operator Metrics: N/A ``` ### 6. `6_run_all_scripts.sh`运行1-5全部脚本 ```bash cd scripts ./6_run_all_scripts.sh ``` ## 性能分析 | 方法 | 时间(秒) | | ---- | ---- | | ExecuteV2 | 1.4434689999999999 | | CompileAndExecute | 4.1163400000000001 | | tensorflow-cpu | 38.84127140045166 |