# cutlass-notes **Repository Path**: magicor/cutlass-notes ## Basic Information - **Project Name**: cutlass-notes - **Description**: https://github.com/ArthurinRUC/cutlass-notes.git - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-06 - **Last Updated**: 2025-11-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CUTLASS Notes The CUTLASS notes series will begin with a minimal GEMM implementation, gradually expand to incorporate CuTe and various CUTLASS components, as well as features of new architectures, e.g. Hopper and Blackwell, ultimately achieving a high-performance fused GEMM operator. ## Usage ```bash git clone https://github.com/ArthurinRUC/cutlass-notes.git # clone cutlass cd cutlass-notes git submodule update --init --recursive ``` ## Run sample code All example code in this GitHub repository can be compiled and run by simply executing the Python script. For example: ```bash cd 01-minimal-gemm python minimal_gemm.py ``` ## Note list | Notes | Summary | Links | |---------------------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------| | **00-Intro** | Brief introduction to CUTLASS | [intro](https://zhuanlan.zhihu.com/p/1937220431728845963) | | **01-minimal-gemm** |
  • Introduces CuTe fundamentals
  • Implements 16x8x8 GEMM kernel using single MMA instruction from scratch
  • Python kernel invocation, precision validation & performance benchmarking
  • Profiling with Nsight Compute (ncu)
  • | [minimal-gemm](https://zhuanlan.zhihu.com/p/1937517614084650073) | | **02-mixed-precision-gemm** |
  • Implements mixed-precision GEMM supporting varying input/output/accumulation precisions
  • Explores technical details for numerical precision conversion within kernels
  • Demonstrates custom FP8 GEMM kernel implementation via PTX instructions (for CUTLASS-unsupported MMA ops)
  • | [mixed-precision-gemm](https://zhuanlan.zhihu.com/p/1940158874255602181) | | **03-tiled-mma** |
  • Introduces the key conceptual model of GEMM operator: Three-Level Tiling
  • Details the implementation of Tiled MMA operations in CUTLASS CuTe
  • Explains the usage and semantics of various parameters in the Tiled MMA API
  • Extends the GEMM kernel from single instruction to single tile operation
  • | [tiled-mma](https://zhuanlan.zhihu.com/p/1950555644814946318) | | **04-tiled-copy** | *Coming soon* | *Stay tuned* | ## License This project is licensed under the MIT License - see the LICENSE file for details.