代码拉取完成,页面将自动刷新
This implementation has two main features :
Python Version | C++ Version | |
---|---|---|
Layer Norm and Residual Add Variant | X | X |
Includes Linear Biases | X | |
Reduces CPU Overheads | X | |
Fuses masking with Softmax | X | |
Removes Transposes and Copies | X | X |
Includes Self and Encoder/Decoder Variants | X | X |
SelfMultiheadAttn(
hidden dim, heads, dropout=prob, bias=bool, include_norm_add=bool, impl='fast' )
EncdecMultiheadAttn(
hidden dim, heads, dropout=prob, bias=bool, include_norm_add=bool, impl='fast' )
impl
has two options:
fast
uses C++ Versiondefault
uses Python Version$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_multihead_attn" ./
Perf test script is found here!
cd contrib/examples/multihead_attn
python perf_test_multihead_attn.py --ref
python perf_test_multihead_attn.py
torch.nn.MultiheadAttn
python perf_test_multihead_attn.py --native
python perf_test_multihead_attn.py --seq-length 64 --num-seqs-start 10 --num-seqs-stop 120 --num-seqs-inc 5
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。