maple 编译 coremark的方法见:
https://gitee.com/openarkcompiler/OpenArkCompiler/wikis/lmbc-testsuits
perf 采集命令:
perf record -e cycles,instructions -g ./coremark.exe 0x0 0x0 0x66 200000 7 1 2000
perf report -n --no-children -f -i perf.data
Samples: 41K of event 'cycles', Event count (approx.): 26784499276, DSO: coremark.exe
Overhead Samples Command Symbol
+ 20.38% 8459 coremark.exe [.] core_list_find
+ 17.18% 7136 coremark.exe [.] core_bench_list
+ 15.48% 6421 coremark.exe [.] core_state_transition
+ 13.28% 5517 coremark.exe [.] crcu16
+ 7.83% 3251 coremark.exe [.] matrix_mul_matrix_bitextract
+ 6.44% 2673 coremark.exe [.] matrix_test
+ 6.09% 2529 coremark.exe [.] matrix_mul_matrix
+ 4.83% 2006 coremark.exe [.] core_bench_state
+ 4.20% 1743 coremark.exe [.] core_list_mergesort
+ 1.37% 568 coremark.exe [.] calc_func
+ 1.16% 481 coremark.exe [.] cmp_idx
+ 0.65% 268 coremark.exe [.] matrix_mul_vect
+ 0.57% 236 coremark.exe [.] cmp_complex
0.31% 129 coremark.exe [.] crcu32
0.16% 68 coremark.exe [.] crc16
0.02% 10 coremark.exe [.] core_bench_matrix
0.01% 3 coremark.exe [.] iterate
Samples: 77K of event 'cycles', Event count (approx.): 50332083930, DSO: maple_origin_coremark.exe
Overhead Samples Command Symbol
+ 28.76% 22409 maple_origin_co [.] crcu16
+ 21.81% 16990 maple_origin_co [.] core_state_transition
+ 13.85% 10788 maple_origin_co [.] core_bench_list
+ 12.13% 9447 maple_origin_co [.] core_list_find
+ 8.10% 6313 maple_origin_co [.] matrix_test
+ 6.85% 5336 maple_origin_co [.] matrix_mul_matrix_bitextract
+ 2.98% 2318 maple_origin_co [.] core_list_mergesort
+ 2.56% 1994 maple_origin_co [.] core_bench_state
+ 1.04% 811 maple_origin_co [.] calc_func
+ 0.67% 520 maple_origin_co [.] cmp_idx
0.49% 381 maple_origin_co [.] crcu32
0.32% 249 maple_origin_co [.] crc16
0.29% 227 maple_origin_co [.] cmp_complex
0.05% 40 maple_origin_co [.] core_bench_matrix
0.01% 11 maple_origin_co [.] iterate
可以看出下列三个函数的samples比例maple是明显高于gcc的,分别是
crc16u函数定义在 core_util.c
文件中,使用交叉编译工具链编译出core_util.s然后将crc16u的汇编替换到maple编译的core_util.s中。
# gcc
$MAPLE_ROOT/tools/gcc-linaro-7.5.0/bin/aarch64-linux-gnu-gcc -O2 -Ilinux -Iposix -I. -DFLAGS_STR=\""-O2 -DPERFORMANCE_RUN=1 -lrt"\" -DITERATIONS=0 -DPERFORMANCE_RUN=1 -S core_util.c -o core_util.s -lrt
# maple
$MAPLE_ROOT/build/tools/common/maplec -O2 -Ilinux -Iposix -I. -DFLAGS_STR=\""-O2 -DPERFORMANCE_RUN=1 -lrt"\" -DITERATIONS=0 -DPERFORMANCE_RUN=1 core_util.c -c -s core_util.s -lrt
然后再重新链接。
最终执行结果如下:
gcc | maple | maple_replaced(crc16u) | maple_replaced(crc16u+core_state_transition) | |
---|---|---|---|---|
total times | 10.262000 | 19.163932 | 14.582007 | 11.222703 |
percent | 100% | 53.54% | 70.37% | 91.51% |
结合之前文权分析的结论,crcu8也有类似问题(入参不一样,可能导致执行路径也不一样),这两个函数代码段都是gcc的两倍之多,且都是热点函数。
maple:
Samples: 57K of event 'branch-misses', Event count (approx.): 266571429, DSO: replaced_crcu16_maple_coremark.exe
Children Self Command Symbol
+ 100.00% 0.00% replaced_crcu16 [.] _start
+ 33.99% 33.97% replaced_crcu16 [.] core_state_transition
+ 29.24% 29.23% replaced_crcu16 [.] core_bench_list
+ 10.10% 10.08% replaced_crcu16 [.] core_list_find
+ 6.33% 6.32% replaced_crcu16 [.] matrix_test
+ 5.87% 5.86% replaced_crcu16 [.] core_list_mergesort
+ 4.19% 4.19% replaced_crcu16 [.] core_bench_state
+ 3.29% 3.29% replaced_crcu16 [.] matrix_mul_matrix_bitextract
+ 2.57% 2.57% replaced_crcu16 [.] calc_func
+ 1.86% 1.86% replaced_crcu16 [.] crcu16
+ 1.79% 1.79% replaced_crcu16 [.] cmp_idx
+ 0.61% 0.61% replaced_crcu16 [.] cmp_complex
0.13% 0.13% replaced_crcu16 [.] crc16
0.01% 0.01% replaced_crcu16 [.] crcu32
0.01% 0.01% replaced_crcu16 [.] core_bench_matrix
0.00% 0.00% replaced_crcu16 [.] main
gcc
Samples: 41K of event 'branch-misses', Event count (approx.): 90178643, DSO: coremark.exe
Children Self Command Symbol
+ 99.99% 0.00% coremark.exe [.] _start
+ 99.99% 0.00% coremark.exe [.] main
+ 99.98% 0.00% coremark.exe [.] iterate
+ 89.04% 6.06% coremark.exe [.] core_bench_list
+ 77.76% 17.44% coremark.exe [.] core_list_mergesort
+ 59.98% 2.34% coremark.exe [.] cmp_complex
+ 56.32% 4.75% coremark.exe [.] calc_func
+ 28.92% 0.02% coremark.exe [.] core_bench_matrix
+ 11.87% 5.52% coremark.exe [.] core_bench_state
+ 10.92% 10.90% coremark.exe [.] crcu16
+ 10.45% 10.45% coremark.exe [.] matrix_test
+ 9.78% 9.73% coremark.exe [.] core_state_transition
+ 9.27% 9.24% coremark.exe [.] matrix_mul_matrix_bitextract
+ 9.22% 9.19% coremark.exe [.] core_list_find
+ 8.32% 8.30% coremark.exe [.] matrix_mul_matrix
+ 4.96% 4.96% coremark.exe [.] cmp_idx
+ 0.86% 0.85% coremark.exe [.] matrix_mul_vect
0.02% 0.02% coremark.exe [.] crcu32
0.01% 0.01% coremark.exe [.] crc16
maple:
mbc:
lmbc:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
gcc:
result:
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.605992 | 19.268120 | 19.090785 | 19.305054 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.581693 | 19.197274 | 18.812054 | 19.202711 |
size | 29024 | 29712 | 31534 | 30274 |
Please use:
maple --run=me --option=-O2 --genlmbc
for producing the .lmbc file. The size should be smaller.
Inlining的优化,可以在输入mbc/lmbc后,在调用mplcg前做:
maple foo.lmbc --run=mpl2mpl:mplcg --option=-O2:-O2
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.591438 | 18.864150 | 18.920330 | 19.220367 |
siez | 29024 | 29728 | 26504 | 25282 |
添加PR1153
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.591438 | 17.972678 | 17.797790 | 18.674899 |
siez | 29024 | 29728 | 26504 | 25282 |
PR1153未合入,PR1153已取消CodeReady
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.592007 | 18.236831 | 18.222710 | 18.735336 |
size | 29024 | 29728 | 26455 | 25242 |
PR1153 已经放弃了,因为conflicts太多。由其他PR取代。
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.574882 | 17.925919 | 17.945642 | 18.384274 |
size | 29024 | 29728 | 26465 | 25252 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.593235 | 17.945715 | 17.976522 | 18.442803 |
size | 29024 | 29728 | 26465 | 25252 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.585182 | 14.989517 | 15.002101 | Segmentation fault |
size | 29024 | 29968 | 27482 | 26219 |
过去一个星期,mplcg 有regression. 从 .lmbc开始,用一个星期前的mplcg,问题就看不到。
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.598195 | 14.996783 | 14.983015 | 14.689677 |
size | 29024 | 29968 | 27465 | 26319 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.590252 | 15.024478 | 14.994562 | 14.645256 |
size | 29024 | 29968 | 27309 | 26142 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.596213 | 15.010635 | 14.997347 | Segmentation fault |
size | 29024 | 29968 | 27365 | 26496 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total time | 10.597565 | 15.015716 | 15.009774 | 13.724592 |
size | 29024 | 29968 | 27365 | 26496 |
加PR1220之后的数据
跑lmbc的时候存在error
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total time | 10.584101 | 15.036662 | 14.996647 | 13.740912 |
size | 29024 | 29968 | 27357 | 26488 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.588940 | 14.660109 | 14.674740 | 14.564830 |
size | 29024 | 29968 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.583092 | 14.656732 | 14.678527 | 14.451514 |
size | 29024 | 29968 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.599833 | 14.650416 | 14.669978 | 14.527161 |
size | 29024 | 29968 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.603636 | 14.860165 | 14.848948 | 14.648584 |
size | 29024 | 29968 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.614331 | 14.877399 | 14.851216 | 14.683168 |
size | 29024 | 29968 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total time | 10.625604 | 14.872185 | 14.878506 | 14.668003 |
size | 29024 | 29992 | 27326 | 26457 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.598562 | 13.904174 | 13.891701 | 14.460430 |
size | 29024 | 29992 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.599657 | 11.487677 | 11.449768 | 11.328723 |
size | 29024 | 29992 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.590581 | 11.478291 | 11.464892 | 11.346839 |
size | 29024 | 29992 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total times | 10.615672 | 11.475406 | 11.450076 | 11.330784 |
size | 29024 | 29992 |
gcc | maple | mbc | lmbc | |
---|---|---|---|---|
total size | 10.613873 | 11.477163 | 11.452273 | 11.340038 |
size | 29024 | 29992 |
登录 后才可以发表评论