# McFlashInfer **Repository Path**: metax-maca/McFlashInfer ## Basic Information - **Project Name**: McFlashInfer - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-27 - **Last Updated**: 2026-04-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README McFlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more on MACA platform. McFlashInfer focuses on LLM serving and inference, and delivers state-of-the-art performance across diverse scenarios. ## Create Conda Env ``` conda create -n your_env_name python=3.10 ``` ## Activate Conda Env ``` conda activate your_env_name ``` ## Install Dependencies ``` pip install your_maca_torch2.4/2.6_whl --force-reinstall --no-deps pip install your_maca_triton.whl --force-reinstall --no-deps pip install ninja pip install einops pip install setuptools==75.8.2 pip install numpy==1.24.2 pip install pytest pip install packaging pip install SentencePiece pip install accelerate pip install wheel pip install build pip install black pip install cpplint pip install pylint ``` ## Set Environment Variables ``` export MACA_PATH=/your/maca/path export MACA_CLANG_PATH=${MACA_PATH}/mxgpu_llvm/bin export LD_LIBRARY_PATH=${MACA_PATH}/lib:${MACA_PATH}/mxgpu_llvm/lib:$LD_LIBRARY_PATH export CUDA_PATH=$MACA_PATH/tools/cu-bridge export PATH=$MACA_PATH/mxgpu_llvm/bin:$MACA_PATH/bin:$PATH ``` ## Build Clean build artifacts if needed. ``` ./clean.sh ``` Build AOT kernels and create FlashInfer distributions. ``` sh python -m flashinfer.aot python -m build --no-isolation --wheel ``` Please don't use JIT mode because it is not stable yet. ## Install Wheel ``` pip install dist/flashinfer-*.whl ```