# mirage **Repository Path**: underdogs/mirage ## Basic Information - **Project Name**: mirage - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-11 - **Last Updated**: 2025-04-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Mirage: Automatically Generating Fast GPU Kernels without Programming in CUDA/Triton Mirage is a tool that automatically generates fast GPU kernels for PyTorch programs through superoptimization techniques. For example, to get fast GPU kernels for attention, users only need to write a few lines of Python code to describe attention's computation. For a given PyTorch program, Mirage automatically searches the space of potential GPU kernels that are functionally equivalent to the input program and discovers highly-optimized kernel candidates. This approach allows Mirage to find new custom kernels that outperform existing expert-designed ones. ## Quick Installation The quickest way to try Mirage is installing the latest stable release from pip: ```bash pip install mirage-project ``` We also provide some pre-built binary wheels in the [Release Page](https://github.com/mirage-project/mirage/releases/latest). For example, to install mirage 0.2.2 compiled with CUDA 12.2 for python 3.10, using the following command: ```bash pip install https://github.com/mirage-project/mirage/releases/download/v0.2.2/mirage_project-0.2.2+cu122-cp310-cp310-linux_x86_64.whl ``` You can also install Mirage from source code: ```bash git clone --recursive https://www.github.com/mirage-project/mirage cd mirage pip install -e . -v ``` ## Quickstart Mirage can automatically generate fast GPU kernels for arbitrary PyTorch programs. The Mirage-generated kernels can be integrated into a PyTorch program with a few lines of code changes. As an example, we show how to use Mirage to generate kernels that fuse [RMSNorm](https://arxiv.org/pdf/1910.07467) and Linear to accelerate Transformer-based large language model computation. More examples are available in [tutorials](https://mirage-project.readthedocs.io/en/latest/tutorials/index.html). The follow code snippet shows a native PyTorch implementation for a Transformer layer in LLaMA-3-8B. ```python rms_norm_1 = torch.nn.RMSNorm(4096) rms_norm_2 = torch.nn.RMSNorm(4096) Y = rms_norm_1(X) Z = torch.matmul(Y, Wqkv) O = attention(Z) U = rms_norm_2(Z) V = torch.matmul(U, W13) V1, V3 = V.chunk(2, -1) # split omitted in the above figure output = torch.matmul(silu(V1) * V3, W2) # silu and this matmul omitted in the above figure ```