# xformers
**Repository Path**: jerrylinkun/xformers
## Basic Information
- **Project Name**: xformers
- **Description**: XXXXXXXXXXXXXXXXXXXXXXXXX
- **Primary Language**: Unknown
- **License**: BSD-3-Clause-Clear
- **Default Branch**: att_mask
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-12-08
- **Last Updated**: 2024-10-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
There are regular builds of xformers as it is developed on the `main` branch.
To use these, you must be on Linux and have a conda environment with Python 3.9 or 3.10, CUDA 11.3 or 11.6, and PyTorch 1.12.1.
You can install the latest with
```bash
conda install xformers -c xformers/label/dev
```
These commands will fetch the latest version of the code and then install xFormers from source.
If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.
```bash
git clone git@github.com:facebookresearch/xformers.git
git submodule update --init --recursive
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .
# or, for OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ pip install -e .
```
Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run `pip install -e .` and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via `python3 setup.py clean && python3 setup.py develop`, so similarly wipe the `build` folder and redo a pip install -e.
Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:
* NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with `module unload cuda module load cuda/xx.x`, possibly also `nvcc`
* the version of GCC that you're using matches the current NVCC capabilities
* the `TORCH_CUDA_ARCH_LIST` env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is `export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"`
Some parts of xFormers use [Triton](http://www.triton-lang.org), and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running `pip install triton`. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance `python3 xformers/benchmarks/benchmark_triton_softmax.py`
Triton will cache the compiled kernels to `/tmp/triton` by default. If this becomes an issue, this path can be specified through the `TRITON_CACHE_DIR` environment variable.
Some parts of xFormers use AOT Autograd from the [FuncTorch](https://pytorch.org/functorch/stable/) library, and will only expose themselves if FuncTorch is installed, and a compatible GPU is present. If functorch was not installed as part of the testing procedure, you can install it directly through pip.
```bash
pip install functorch
```
Once installed, set the flag `_is_functorch_available = True` in `xformers/__init__.py`. You can optionally test that the installation is successful by running one of the functorch-related benchmarks `python3 xformers/benchmarks/benchmark_nvfuser.py`
If you are importing the xFormers library in a script, you can modify the flag as such:
```python
import xformers
xformers._is_functorch_available = True
```



[](https://colab.research.google.com/github/facebookresearch/xformers/blob/main/docs/source/xformers_mingpt.ipynb)
[](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)
[](https://app.circleci.com/pipelines/github/facebookresearch/xformers/)
[](https://codecov.io/gh/facebookresearch/xformers)
[](https://github.com/psf/black)
[](CONTRIBUTING.md)
--------------------------------------------------------------------------------
## Description
xFormers is a modular and field agnostic library to flexibly generate transformer architectures from interoperable and optimized building blocks. These blocks are not limited to xFormers and can also be cherry picked as the user see fit.
## Getting started
The full [documentation](https://facebookresearch.github.io/xformers/) contains instructions for getting started, deep dives and tutorials about the various APIs.
If in doubt, please check out the [HOWTO](HOWTO.md). Only some general considerations are laid out in the README.
For recent changes, you can have a look at the [changelog](CHANGELOG.md)
### Installation
To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through `python-virtualenv` or `conda` for instance.
PyTorch must be installed. Using conda for example:
```bash
conda create --name xformers python=3.10
conda activate xformers
conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1
```
*Please note that Pytorch 1.12 or newer is required.
There are two ways you can install xFormers locally:
Conda dev packages
Build from source (dev mode)
Sparse attention kernels
Triton
AOTAutograd/NVFuser
- [Scaled dot product](xformers/components/attention/scaled_dot_product.py) - *[Attention is all you need, Vaswani et al., 2017](https://arxiv.org/abs/1706.03762)* - [Sparse](xformers/components/attention/scaled_dot_product.py) - whenever a sparse enough mask is passed - [BlockSparse](xformers/components/attention/blocksparse.py) - *courtesy of [Triton](www.triton-lang.org)* - [Linformer](xformers/components/attention/linformer.py) - *[Linformer, self-attention with linear complexity, Wang et al., 2020](https://arxiv.org/abs/2006.04768)* - [Nystrom](xformers/components/attention/nystrom.py) - *[Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention, Xiong et al., 2021](https://arxiv.org/abs/2102.03902)* - [Local](xformers/components/attention/local.py). Notably used in (and many others) - *[Longformer: The Long-Document Transformer, Beltagy et al., 2020](https://arxiv.org/abs/2004.05150)* - *[BigBird, Transformer for longer sequences, Zaheer et al., 2020](https://arxiv.org/abs/2007.14062)* - [Favor/Performer](xformers/components/attention/favor.py) - *[Rethinking Attention with Performers, Choromanski et al., 2020](https://arxiv.org/abs/2009.14794v1)* - [Orthoformer](xformers/components/attention/ortho.py) - *[Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers, Patrick et al., 2021](https://arxiv.org/abs/2106.05392)* - [Random](xformers/components/attention/random.py) - See BigBird, Longformers,.. - [Global](xformers/components/attention/global_tokens.py) - See BigBird, Longformers,.. - [FourierMix](xformers/components/attention/fourier_mix.py) - *[FNet: Mixing Tokens with Fourier Transforms, Lee-Thorp et al.](https://arxiv.org/abs/2105.03824v1)* - [CompositionalAttention](xformers/components/attention/compositional.py) - *[Compositional Attention: Disentangling search and retrieval, S. Mittal et al.](https://arxiv.org/pdf/2110.09419v1.pdf)* - [2D Pooling](xformers/components/attention/pooling.py) - *[Metaformer is actually what you need for vision, Yu et al.](https://arxiv.org/pdf/2111.11418v1.pdf)* - [Visual Attention](xformers/components/attention/visual.py) - *[`Visual Attention Network`_, Guo et al](https://arxiv.org/pdf/2202.09741.pdf)* - ... add a new one [see Contribution.md](CONTRIBUTING.md)
- [MLP](xformers/components/feedforward/mlp.py) - [Fused](xformers/components/feedforward/fused_mlp.py) - [Mixture of Experts](xformers/components/feedforward/mixture_of_experts.py) - [Conv2DFeedforward](xformers/components/feedforward/conv_mlp.py)
- [Sine](xformers/components/positional_embedding/sine.py) - [Vocabulary](xformers/components/positional_embedding/vocab.py) - [Rotary](xformers/components/positional_embedding/rotary.py) - [Simplicial](xformers/components/simplicial_embedding.py)
- [Pre](https://arxiv.org/pdf/2002.04745v1.pdf) - [Post](https://arxiv.org/pdf/2002.04745v1.pdf) - [DeepNorm](https://arxiv.org/pdf/2203.00555v1.pdf)
This is completely optional, and will only occur when generating full models through xFormers, not when picking parts individually. There are basically two initialization mechanisms exposed, but the user is free to initialize weights as he/she sees fit after the fact. - Parts can expose a `init_weights()` method, which define sane defaults - xFormers supports [specific init schemes](xformers/factory/weight_init.py) which *can take precedence* over the init_weights() If the second code path is being used (construct model through the model factory), we check that all the weights have been initialized, and possibly error out if it's not the case (if you set `xformers.factory.weight_init.__assert_if_not_initialized = True`) Supported initialization schemes are: - [Small init](https://arxiv.org/abs/1910.05895) - [Timm defaults](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py) - [ViT defaults](https://github.com/google-research/vision_transformer) - [Moco v3 defaults](https://github.com/facebookresearch/moco-v3) One way to specify the init scheme is to set the `config.weight_init` field to the matching enum value. This could easily be extended, feel free to submit a PR !