# ChipSum **Repository Path**: chip-sum/ChipSum ## Basic Information - **Project Name**: ChipSum - **Description**: ChipSum is a framework intigrating HPC libraries using C++ template programming. - **Primary Language**: C++ - **License**: MIT - **Default Branch**: main - **Homepage**: https://chipsum.readthedocs.io/en/latest/ - **GVP Project**: No ## Statistics - **Stars**: 7 - **Forks**: 1 - **Created**: 2021-08-11 - **Last Updated**: 2024-02-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: 异构架构, 基础代数库 ## README # 快速开始 ## 一、编译环境准备 1. Linux系统 目前`ChipSum`仅支持Linux系统 2. CMake `ChipSum`以及第三方库`kokkos`和`kokkos-kernels`的编译, 都需要`Cmake>=3.20`,编译器需要支持C++17。 若您使用成都或昆山超算上使用DCU进行编译和运行,需要将hip版本>4.5且在环境变量设置hipcc为默认编译器。 3. 获取`ChipSum`代码 ``` # get ChipSum git clone https://gitee.com/chip-sum/ChipSum.git # get kokkos and kokkos-kernels cd ChipSum/ git submodule init git submodule update cd tpls/kokkos git checkout ``` ## 二、编译 为了方便用户编译,`ChipSum`使用了一个python脚本帮助完成编译过程。已分别在`AMD VEGA906`和`NVIDIA 2080ti`等架构上完成编译并运行。 1 `AMD VEGA906` ``` # AMD Vega906/900 python3 setup.py arch=VEGA906 compiler=/path/to/your/rocm(i.e. 5.4.1)/bin/hipcc hip=/path/to/your/rocm(i.e. 5.4.1)/hip ``` 2 `NVIDIA 2080ti` ``` # NVIDIA 2080ti python3 setup.py arch=Turing75 cuda=/Path/To/Your/Cuda ``` 3 `Volta72` ``` # for Volta72 python3 setup.py cuda=/Path/To/Your/Cuda arch=Volta72 ``` 4 `MX350(Pascal61)` ``` # for MX350(Pascal61) python3 setup.py cuda=/Path/To/Your/Cuda arch=Pascal61 ``` 5 `CPU`编译 根据设备CPU架构输入arch参数和编译器参数,以WestMere(WSM)架构和g++编译器为例 ``` python3 setup.py arch=WSM compiler=g++ j=8 ``` 6 后端编译 1)若想后端使用BLAS,可以使用'use_tpls=blas'参数,注意如默认路径找不到BLAS,则需给出BLAS路径, 以SkyLake(SKX) CPU, g++编译器为例 ``` python3 setup.py arch=SKX compiler=g++ use_tpls=blas j=8 or python3 setup.py arch=SKX compiler=g++ use_tpls=blas blas_path=/your/BLAS/lib/PATH blas_name=openblas or blas ``` 2)若想后端使用MKL,可以使用'use_tpls=mkl'参数 以SkyLake(SKX) CPU, intel编译器为例 ``` python3 setup.py arch=SKX compiler=icc use_tpls=mkl j=8 ``` 3)若想后端使用cublas,可以使用'use_tpls=cublas/rocblas'参数 以2080ti(Turing75) GPU为例 ``` python3 setup.py arch=Turing75 cuda=/Path/To/Your/Cuda use_tpls=cublas/rocblas j=8 ``` 4)若想后端使用cusparse,可以使用'use_tpls=cusparse/rocsparse'参数 以2080ti(Turing75) GPU为例 ``` python3 setup.py arch=Turing75 cuda=/Path/To/Your/Cuda use_tpls=cusparse/rocsparse j=8 ``` 7 其他 若想指定安装目录,可以使用`prefix`参数 ``` mkdir /anywhere/you/like python3 setup.py prefix=/anywhere/you/like ``` 若想指定核编译,可以使用`j`参数 ``` python3 setup.py j=32 ``` 若想后端使用OpenBLAS,可以使用'use_tpls'参数,注意如默认路径找不到OpenBLAS,则需给出OpenBLAS路径 ``` python3 setup.py use_tpls=blas or python3 setup.py use_tpls=blas blas_path=/your/OpenBLAS/lib/PATH ``` 注意:第一次编译时需编译kokkos和kokkos-kernels,耗时较久。后续使用时仅编译ChipSum内容,耗时很快。 ## 三、算例 ### 1. 验证安装 `ChipSum`在`test.cpp`中提供了一个简单的用例示范,默认路径编译完成后,可以在`./build`中查看编译结果。 ``` # default path cd ./build ./ChipSum ``` 预期输出: ``` origin matrix A: densemat_0_mirror(5,5): [0.840188, 0.394383, 0.783099, 0.79844, 0.911647] [0, 0.197551, 0.335223, 0.76823, 0.277775] [0, 0, 0.55397, 0.477397, 0.628871] [0, 0, 0, 0.364784, 0.513401] [0, 0, 0, 0, 0.95223] return expect 0: 0 inverse:A densemat_0_mirror(5,5): [1.19021, -2.37608, -0.244662, 2.71905, -1.75077] [0, 5.06197, -3.06314, -6.65166, 4.13262] [0, 0, 1.80515, -2.36242, 0.0815571] [0, 0, 0, 2.74134, -1.47801] [0, 0, 0, 0, 1.05017] ``` ### 2. Conjugate Gradient 算例 按默认路径完成编译后,在` ./build/examples/chipsumSolver`路径下存在cg算法的可执行文件,可以实现一个简单的求解算例: ``` cd path_to_chipsum ./build/examples/chipsumSolver/cg ``` ### 3. BiConjugate Gradient 算例 按默认路径完成编译后,在` ./build/examples/chipsumSolver`路径下存在BiCG算法的可执行文件,可以实现一个简单的求解算例: ``` cd path_to_chipsum ./build/examples/chipsumSolver/bicg ``` ### 4. BiConjugate Gradient STAB 算例 按默认路径完成编译后,在` ./build/examples/chipsumSolver`路径下存在BiCGSTAB算法的可执行文件,可以实现一个简单的求解算例: ``` cd path_to_chipsum ./build/examples/chipsumSolver/bicgstab ``` ### 5. GMRES算例 按默认路径完成编译后,在` ./build/examples/chipsumSolver`路径下存在GMRES算法的可执行文件,可以实现一个简单的求解算例: ``` cd path_to_chipsum ./build/examples/chipsumSolver/gmres ``` # 四、应用 我们实现了一个基于`ChipSum`的应用,`chipsumAI`。该应用利用`ChipSum`的函数方法和数据结构等,完成了一个mnist手写体识别样例。该部分将介绍如何编译和运行`chipsumAI`。 ### 编译`ChipSum` 首先需要将`ChipSum`编译并make install至指定文件夹,如:examples/chipsumAI/install_lib ``` cd path_to_chipsum cd ./build cmake -DCMAKE_INSTALL_PREFIX=../examples/chipsumAI/install_lib .. make install ll ../examples/chipsumAI/install_lib ``` 成功make install后,会在目标文件夹下(例子中,install_lib文件夹下)新生成以下文件。 ``` . ├── bin │   └── ChipSum ├── include │   ├── chipsum │   └── ChipSum.hpp ├── lib │   └── libchipsum.a ``` 此时,若需在其他文件中使用`ChipSum`,仅需`#include "ChipSum.hpp"`即可使用相应数据结构和函数。 ### 编译`chipsumAI` 在运行手写体识别代码样例前,需要对项目进行编译,已获得可运行的可执行文件。 ``` cd path_to_chipsum cd ./examples/chipsumAI mkdir build && cd build # ChipSum_DIR should be absolute path export ChipSum_DIR=/path/to/chipsum/ # ChipSumLib_DIR should be absolute path export ChipSumLib_DIR=/path/to/install_lib/ cmake -DChipSum_DIR=${ChipSum_DIR} -DChipSumLib_DIR=${ChipSumLib_DIR} .. make -j8 ./mnist ``` 使用命令`./mnist`运行mnist实现手写体识别,预期输出示例如下: ``` ******input is****** : 9 [ ] [ ] [ ] [ ] [ ] [ ] [ # # # # # # # # ] [ # # # # # # # # # # ] [ # # # # # # # # # # # ] [ # # # # # # # # # # # ] [ # # # # # # # # # # ] [ # # # # # # # # # # # ] [ # # # # # # # # # # ] [ # # # # # # # # # # ] [ # # # # # # # ] [ # # # # # # ] [ # # # # # # # ] [ # # # # # # ] [ # # # # # # ] [ # # # # # # ] [ # # # # # # # ] [ # # # # # # ] [ # # # # # # ] [ # # # # # # ] [ # # # # # ] [ # # # # ] [ ] [ ] *****prediction is***** : 9 ``` # 五、附录 ### 1. setup中,硬件与arch对应参数 ``` [AMD: CPU]" AMDAVX = AMD CPU" ZEN = AMD Zen-Core CPU" ZEN2 = AMD Zen2-Core CPU" [AMD: GPU]" VEGA900 = AMD GPU MI25 GFX900" VEGA906 = AMD GPU MI50/MI60 GFX906" VEGA908 = AMD GPU MI100 GFX908" VEGA90A = AMD GPU MI200 GFX90A" [ARM]" ARMV80 = ARMv8.0 Compatible CPU" ARMV81 = ARMv8.1 Compatible CPU" ARMV8_THUNDERX = ARMv8 Cavium ThunderX CPU" ARMV8_THUNDERX2 = ARMv8 Cavium ThunderX2 CPU" [Intel]" WSM = Intel Westmere CPUs" SNB = Intel Sandy/Ivy Bridge CPUs" HSW = Intel Haswell CPUs" BDW = Intel Broadwell Xeon E-class CPUs" SKX = Intel Sky Lake Xeon E-class HPC CPU(AVX512)" [Intel Xeon Phi]" KNC = Intel Knights Corner Xeon Phi" KNL = Intel Knights Landing Xeon Phi" [NVIDIA]" Kepler30 = NVIDIA Kepler generation CC 3.0" Kepler32 = NVIDIA Kepler generation CC 3.2" Kepler35 = NVIDIA Kepler generation CC 3.5" Kepler37 = NVIDIA Kepler generation CC 3.7" Maxwell50 = NVIDIA Maxwell generation CC 5.0" Maxwell52 = NVIDIA Maxwell generation CC 5.2" Maxwell53 = NVIDIA Maxwell generation CC 5.3" Pascal60 = NVIDIA Pascal generation CC 6.0" Pascal61 = NVIDIA Pascal generation CC 6.1" Volta70 = NVIDIA Volta generation CC 7.0" Volta72 = NVIDIA Volta generation CC 7.2" Turing75 = NVIDIA Turing generation CC 7.5" Ampere80 = NVIDIA Ampere generation CC 8.0" Ampere86 = NVIDIA Ampere generation CC 8.6" ```