# parellel-learn **Repository Path**: rainrime/parallel-learn ## Basic Information - **Project Name**: parellel-learn - **Description**: 学习并行编程 - **Primary Language**: C++ - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-07-12 - **Last Updated**: 2021-11-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 超像素算法 (SLIC) 优化 ## 依赖 - 仅在 linux(x64) 下使用 gcc 测试过,目前不需要其他依赖。 - 注意:由于用到 libmvec,Glibc 版本需要 >= 2.22 - SLIC 以外文件夹的其他代码可能需要 tbb、openblas、eigen 等,暂时不需要。 ## 编译及运行 原程序在 SLIC_BK 下,添加了 Makefile ```sh # 以下代码均以本项目根目录为初始路径 cd SLIC_BK make clean && make ./SLIC.out ``` 优化后程序在 SLIC 目录下,utils.h 中的宏 `#define THE_THREAD_NUMS` 确定了线程数,请根据本机情况自行调整。 ``` cd SLIC make clean && make ./SLIC.out ``` 超算集群上运行方式如下,仅使用单台机器,对 NUMA 的架构进行了一定访存优化。 ``` export OMP_PLACES=cores srun -p amd_256 -N 1 -t 10 ./SLIC.out ``` 注: - 如果平台默认不是 Glibc >= 2.22 ,请自行琢磨链接执行方式,测试期间本人直接在本机 (wsl ubuntu 20.04 lts) 上静态链接,仅将可执行文件上传执行。 ## 运行效果 在 `AMD EPYC 7452 32-Core Processor (2 sockets)`,即双路 32 核共 64 核 64 线程 的机器上运行。 原始程序执行时间约为 5700ms,32 线程时执行时间约为 78ms,整体加速比约为 73,62 线程时执行时间约为 57 ms,整体加速比约为 100。(注:并非仅运用并行带来的加速比,实际上运用了一些单线程优化方法后,并行加速比并不可观) 指定 62 线程,使用环境变量 `OMP_PLACES=cores` ,执行效果如下: ``` $ export OMP_PLACES=cores $ chmod +x ./SLIC.out $ srun -p amd_256 -N 1 -t 10 ./SLIC.out srun: job 555293 queued and waiting for resources srun: job 555293 has been allocated resources width = 2599, height = 3898 sz = 10130902 Initial time = 0 ms Conversion time = 20 ms DeleteEdges and Get_Seeds time = 0 ms numk = 196 Dist iter time=4(4) ms Dist iter time=6(2) ms Dist iter time=8(2) ms Dist iter time=10(2) ms Dist iter time=12(2) ms Dist iter time=14(2) ms Dist iter time=16(2) ms Dist iter time=18(2) ms Dist iter time=20(2) ms Dist iter time=22(2) ms Computing time=28 ms STEP = 227 Segmentation time = 29 ms EC1 time=0 ms EC2 time=3 ms EC3 time=0 ms EC4 time=1 ms EnforceLabelConnectivity time = 6 ms Computing time=57 ms There are 0 points' labels are different from original file. ``` 原始效果如下: ``` $ srun -p amd_256 -N 1 -t 10 ./SLIC.out srun: job 438538 queued and waiting for resources srun: job 438538 has been allocated resources Computing time=5780 ms There are 0 points' labels are different from original file. ``` 优化过程中某一阶段如下: ``` $ srun -p amd_256 -N 1 -t 10 ./SLIC.out srun: job 431514 queued and waiting for resources srun: job 431514 has been allocated resources width = 2599, height = 3898 sz = 10130902 Initial time = 3 ms Conversion time = 80 ms DeleteEdges and Get_Seeds time = 17 ms numk = 196 Dist iter time=18(18) ms Dist iter time=0(0) ms Dist iter time=28(10) ms Dist iter time=0(0) ms Dist iter time=38(10) ms Dist iter time=0(0) ms Dist iter time=48(10) ms Dist iter time=0(0) ms Dist iter time=56(8) ms Dist iter time=0(0) ms Dist iter time=66(10) ms Dist iter time=0(0) ms Dist iter time=75(9) ms Dist iter time=0(0) ms Dist iter time=77(2) ms Dist iter time=0(0) ms Dist iter time=79(2) ms Dist iter time=0(0) ms Dist iter time=81(2) ms Dist iter time=0(0) ms STEP = 227 Segmentation time = 194 ms EnforceLabelConnectivity time = 125 ms Computing time=424 ms There are 0 points' labels are different from original file. ```