# dgemm **Repository Path**: hillgao/dgemm ## Basic Information - **Project Name**: dgemm - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 4 - **Created**: 2025-02-25 - **Last Updated**: 2025-02-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # dgemm 本文以N*N 矩阵乘法为例,叙述优化的过程, 该代码运行于x86 Linux/Mac 平台。 整个优化分为以下几个步骤: - 向量化 (在x86架构上采用AVX) - 循环展开(loop unrolling) - 缓存分块(cache blocking) - 多线程 (OpenMP) 编译运行: mkdir build && cd build cmake && make && ./dgemm_test 代码执行结果: dgemm_c spends 7988 ms dgemm_avx spends 3299 ms dgemm_avx_unroll spends 2763 ms dgemm_avx_unroll_blk spends 1641 ms dgemm_avx_unroll_blk_omp spends 125 ms 矩阵转成按行存储后: dgemm_c spends 8201 ms dgemm_avx spends 3847 ms dgemm_avx_unroll spends 2996 ms dgemm_avx_unroll_blk spends 1963 ms dgemm_avx_unroll_blk_omp spends 137 ms 添加avx512后: dgemm_c spends 8755 ms dgemm_avx spends 3638 ms dgemm_avx_unroll spends 2757 ms dgemm_avx_unroll_blk spends 1865 ms dgemm_avx_unroll_blk_omp spends 137 ms dgemm_avx512 spends 2500 ms dgemm_avx512_unroll spends 1605 ms dgemm_avx512_unroll_blk spends 1509 ms dgemm_avx512_unroll_blk_omp spends 121 ms NOTE: - 本文的代码来自 << Computer Organization and Design RISC-V edition>>. INTEL指令手册: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html