# flash-attention **Repository Path**: ceci3/flash-attention ## Basic Information - **Project Name**: flash-attention - **Description**: xxxxxxxxxxxxxx - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-11-08 - **Last Updated**: 2025-10-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # FlashAttention for vLLM This is a fork of https://github.com/Dao-AILab/flash-attention customized for vLLM. We have the following customizations: - Build: Cmake, torch library (this package is bundled into vLLM). - Size: reduced templating and removal of (training) kernels - Features: Small page size support (FA2), DCP support (FA3) - Performance: Some decode specific optimizations for sizes we care about; as well as mixed batch performance optimizations. (Upstream is understandably hesitant on specializing for inference as they also need to support training; we on the other hand compile out the backward pass kernels and do not test that our optimizations do not break them.)