2 Star 0 Fork 0

mirrors_intel/Deep-learning-math-kernel-research

加入 Gitee
与超过 1400万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
TODO 1.69 KB
一键复制 编辑 原始数据 按行查看 历史
Wino:
- Tile-size: 6
Blocking:
- tail oc/ic handling
Build:
- arch-specific compiling option
Fusion:
- relu
- sum
OMP:
- kmp_malloc(?) allocate from thread-local heap
- omp_in_final()
- omp_get_num_procs
Modularity (2018-07-05): Done
- trans_input/trans_output: common function for offset computing
- code reorg, break transform kernels into small kernel files:
elk_conv_wino_4x4_3x3_input.hxx
...
- Memory size calculation for different xopt
Plain format (nchw/oihw) support (2018-7-11): Done
- fused reorder
- scatter/gather
- double buffer (drop as no effect)
- enable plain fmt for 0xa0e0/0xa0e1
- enable plain fmt for 0xa073
- fix performance regression in block16: format as template parameter
- format-as-blocked (esp. to avoid false sharing in fused output reorder)
IC/OC != 16x (2018-7-16): Done
- Support DNN first layer (drop as it can not achieve good performance with Winograd)
IC < 16, OC = 16x, nchw + Oihw16o => nChw16o
- Support blocked format with padded tensor
IC|OC != 16x, nChw16c + OIhw16i16o => nChw16c
- Support plain format in format-as path
IC|OC != 16x, nchw + oihw => nchw
- Support plain format in format-is path
IC|OC != 16x, nchw + oihw => nchw
MD-Array (2018-7-22): Done
- Cross platform/compiler MD-array
- Improve MD-Array performance for ICC
Conv_1x1 (2018-8-1): Done
- Uni-stride: kernel=1x1, stride=1, padding=0
- Stride=2, padding=0
- Blocked format
- plain format, IC/OC != 16x
- TODO: code clean and Perf tuning
- TODO: padding support
GEMM kernel (2018-8-23): Done
- Rewrite gemm kernel with better readability and modularity
- Apply new gemm kernel to conv1x1
- Apply new gemm kernel to Winograd
- Perf tuning for Winograd: xopt-A072
- Refactoring elx_conv_wino_t
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mirrors_intel/Deep-learning-math-kernel-research.git
git@gitee.com:mirrors_intel/Deep-learning-math-kernel-research.git
mirrors_intel
Deep-learning-math-kernel-research
Deep-learning-math-kernel-research
master

搜索帮助