1 Star 0 Fork 0

光量子与或门_光线寄存器_光线计算机/DCFormer

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

DCFormer

This repository contains the code in both PyTorch and Jax for our paper.

Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li, Xingyuan Yuan
ICML 2024 (oral)

DCFormer

About

We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of Multi-Head Attention(MHA) and increases the expressive power of the model by dynamically composing attention heads. At the core of DCMHA is a Compose function that transforms the attention score and weight matrices in an input-dependent way. DCMHA can be used as a drop-in replacement of MHA in any transformer architecture to obtain the corresponding DCFormer.

In practice, we train DCFormer on TPU for efficiency and then infer on GPU for convenience, so we open-source Jax training code and PyTorch inference code in two separate folders.

Jax

  • The source code is in the jax/ folder, supporting train DCFormer on TPU or GPU with google/MaxText.
  • Please refer to jax/README.md for details.

PyTorch

  • The source code is in the pytorch/ folder, supporting accelerated inference with torch.compile.
  • We also uploaded pretrained (DCFormer-2.8B) (DCFormer++2.8B in the paper) and (DCPythia-6.9B) to Huggingface🤗.
  • Please refer to pytorch/README.md for details.

Synthetic tasks

Synthetic tasks dataset in the paper is located at data/synthetic_dataset.jsonl. Each line in the file contains an in-context learning sample, where words in the bracket [] are compared to calculate accuracy. Eg.

< Elizabeth has jeans. Steven has strawberries. Steven has a fox. >. Steven does not have a kind of clothing
< Karen has a taxi. Karen has a pig. Paul has a cocktail. >. Karen does not have a kind of drink
< Ruth has a sweater. Steven has a taxi. Ruth has a handgun. >. Ruth does not have a kind of [ vehicle ]
. . .
MIT License Copyright (c) 2024 Caiyun-AI Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

https://new.qq.com/rain/a/20240603A02WS200 展开 收起
README
MIT
取消

发行版

暂无发行版

贡献者

全部

语言

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/lightcomputer/DCFormer.git
git@gitee.com:lightcomputer/DCFormer.git
lightcomputer
DCFormer
DCFormer
main

搜索帮助