# cutlass-notes **Repository Path**: magicor/cutlass-notes ## Basic Information - **Project Name**: cutlass-notes - **Description**: https://github.com/ArthurinRUC/cutlass-notes.git - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-06 - **Last Updated**: 2025-11-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CUTLASS Notes The CUTLASS notes series will begin with a minimal GEMM implementation, gradually expand to incorporate CuTe and various CUTLASS components, as well as features of new architectures, e.g. Hopper and Blackwell, ultimately achieving a high-performance fused GEMM operator. ## Usage ```bash git clone https://github.com/ArthurinRUC/cutlass-notes.git # clone cutlass cd cutlass-notes git submodule update --init --recursive ``` ## Run sample code All example code in this GitHub repository can be compiled and run by simply executing the Python script. For example: ```bash cd 01-minimal-gemm python minimal_gemm.py ``` ## Note list | Notes | Summary | Links | |---------------------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------| | **00-Intro** | Brief introduction to CUTLASS | [intro](https://zhuanlan.zhihu.com/p/1937220431728845963) | | **01-minimal-gemm** |

Introduces CuTe fundamentals

Implements 16x8x8 GEMM kernel using single MMA instruction from scratch

Python kernel invocation, precision validation & performance benchmarking

Profiling with Nsight Compute (ncu)

| [minimal-gemm](https://zhuanlan.zhihu.com/p/1937517614084650073) | | **02-mixed-precision-gemm** |

Implements mixed-precision GEMM supporting varying input/output/accumulation precisions

Explores technical details for numerical precision conversion within kernels

Demonstrates custom FP8 GEMM kernel implementation via PTX instructions (for CUTLASS-unsupported MMA ops)

| [mixed-precision-gemm](https://zhuanlan.zhihu.com/p/1940158874255602181) | | **03-tiled-mma** |

Introduces the key conceptual model of GEMM operator: Three-Level Tiling

Details the implementation of Tiled MMA operations in CUTLASS CuTe

Explains the usage and semantics of various parameters in the Tiled MMA API

Extends the GEMM kernel from single instruction to single tile operation