1 Star 3 Fork 3

Gitee 极速下载 / tinyllama-1-1b

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/jzhang38/TinyLlama
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

TinyLlama-1.1B

English | 中文

Chat Demo

TinyLlama项目旨在在3万亿tokens上进行预训练,构建一个拥有11亿参数的Llama模型。经过精心优化,我们"仅"需16块A100-40G的GPU,便可在90天内完成这个任务🚀🚀。训练已于2023-09-01开始。

我们采用了与Llama 2完全相同的架构和分词器。这意味着TinyLlama可以在许多基于Llama的开源项目中即插即用。此外,TinyLlama只有1.1B的参数,体积小巧,适用于需要限制计算和内存占用的多种应用。

新闻

  • 2023-12-18:
    • 添加两个文档 1, 2 说明训练曲线、项目时间表和错误修复的变化。
  • 2023-10-03:
    • 在speculative decoding中添加llama.cpp的代码示例。具体请查看 speculative_decoding/README.md
    • 2023-10-02: 1. 1T-token检查点刚发布。2. 我们在huggingface上记录了所有中间检查点。
    • 2023-09-28: 启用Discord服务器。
  • 2023-09-18:
    • 发布了一个 chat demo,欢迎点击链接来尝试我们的模型。
  • 2023-09-16:

发布时间表

我们会根据以下计划逐步发布中间checkpoint。我们也列了一些基线模型进行比较。

基座模型:

Date 模型权重 Tokens Step Commonsense Avg
2023-09-01 Pythia-1.0B 300B 143k 48.30
2023-09-04 TinyLlama-1.1B-intermediate-step-50k-105b (ModelScope) 105B 50k 46.11
2023-09-16 TinyLlama-1.1B-intermediate-step-240k-503b (ModelScope) 503B 240K 48.28
2023-10-01 TinyLlama-1.1B-intermediate-step-480k-1T 1T 480k 50.22
2023-11-04 TinyLlama-1.1B-intermediate-step-715k-1.5T 1.5T 715k 51.28
2023-11-20 TinyLlama-1.1B-intermediate-step-955k-2T 2T 955k 51.64
2023-12-11 TinyLlama-1.1B-intermediate-step-1195k-2.5T 2.5T 1195k 53.86
2023-12-28 TinyLlama-1.1B-intermediate-step-1431k-3T 3T 1431k 52.99

对话模型:

Date 模型权重 Tokens Step Commonsense Avg
2023-09-16 TinyLlama-1.1B-Chat-V0.1 (ModelScope) 503B 240K 49.57
2023-10-1 TinyLlama-1.1B-Chat-V0.3 1T 480K 51.36
2023-11-04 TinyLlama-1.1B-Chat-V0.4 1.5T 715K 52.30

需要注意的是,由于我们的现在模型还处于训练初期,学习率并没有完全稳定下来,为了更好的体验我们的模型,您可以下载我们 聊天模型 或者通过 chat demo 来尝试我们的模型。

你们也可以在这里实时跟踪TinyLlama的训练损失。

潜在场景

小型但强大的语言模型对许多应用都很有用。以下是一些潜在的场景:

  • 帮助对大型模型进行speculative decoding。
  • 在边缘装置上运行,比如离线的实时机器翻译 (TinyLlama的4比特量化版本的模型权重只需要550MB的内存)。
  • 在游戏中实现实时对话生成(因为还得给游戏本身留显存所以模型要小)。

此外,我们的代码可以给初学者做一个入门预训练的简洁参考。如果你要训练50亿以下参数的语言模型, 你其实不需要Megatron-LM。

训练细节

以下是我们训练设置的一些细节:

Setting Description
Parameters 1.1B
Attention Variant Grouped Query Attention
Model Size Layers: 22, Heads: 32, Query Groups: 4, Embedding Size: 2048, Intermediate Size (Swiglu): 5632
Sequence Length 2048
Batch Size 2 million tokens (2048 * 1024)
Learning Rate 4e-4
Learning Rate Schedule Cosine with 2000 warmup steps
Training Data Slimpajama & Starcoderdata
Data Preprocessing Excluded GitHub subset of Slimpajama; Sampled all code from Starcoderdata
Combined Dataset Size Around 950B tokens
Total Tokens During Training 3 trillion (slightly more than 3 epochs/143k steps)
Natural Language to Code Ratio 7:3
Hardware 16 A100-40G GPUs

速度极快

我们的代码库支持以下特性:

  • 使用FSDP进行多GPU和多节点分布式训练
  • flash attention 2
  • 融合层归一化 (fused layernorm)
  • 融合swiglu (fused swiglu)
  • 融合交叉熵损失 (fused cross entropy loss)
  • 融合旋转位置嵌入 (fused rotary positional embedding)

致谢:flash attention 2、融合层归一化、融合交叉熵损失和融合旋转位置嵌入来自于FlashAttention仓库;融合swiglu来自于xformers

有了这些优化, 我们可以达到24k tokens/秒/A100的训练速度,也就是56%的MFU(在A100-80G上的MFU会更高)。这个速度可以让你可以在8个A100上用32小时训练一个chinchilla-optimial的模型(11亿参数,220亿token)。这些优化也大大减少了显存占用, 我们可以把11亿参数的模型塞入40GB的GPU里面还能同时维持16k tokens的per-gpu batch size。只需要把batch size改小一点, 你就可以在RTX 3090/4090上面训练TinyLlama。 下面是我们的代码库与Pythia和MPT的训练速度的比较。

Model A100 GPU hours taken on 300B tokens
TinyLlama-1.1B 3456
Pythia-1.0B 4830
MPT-1.3B 7920

Pythia的数字来自他们的论文。MPT的数字来自这里,作者说MPT-1.3B"was trained on 440 A100-40GBs for about half a day" on 200B tokens。

TinyLlama是一个相对较小的模型, 同时我们用了GQA, 这意味着它在推理期间也很快。以下是我们测量的一些推理速度:

Framework Device Settings Throughput (tokens/sec)
Llama.cpp Mac M2 16GB RAM batch_size=1; 4-bit inference 71.8
vLLM A40 GPU batch_size=100, n=10 7094.5

开始预训练

请参考PRETRAIN.md

微调

  • 我们在 sft 中添加了我们进行微调和推理的代码。并且基于这个代码我们在openassistant-guanaco 数据集上进行了微调,得到了我们的第一版聊天模型
  • 如果您希望在 RAM 小于 4GB 的 GPU 上对用我们的模型进行微调,可以参考并使用 Qlorabitsandbytes 项目。
  • 目前微调的时候我们并没有广泛对超参进行搜索,也没有选择潜在更优的 instruction 数据集。我们希望促进 NLP 社区对于我们的TinyLlama模型的开放研究,并开源更好的微调聊天模型。我们也会把这些模型放在这个项目中。

TODO

该项目仍在积极开发中。我们团队很小,非常欢迎社区的反馈和贡献。以下是我们计划进行的一些工作:

  • Add scripts for pretraining on other datasets.
  • Sequence length extrapolation.
  • Test out speculative decoding for Llama-2-7B.
  • Test the throughput on RTX 3090/4090.
  • Add fine-tuning scripts.
  • Properly evaluate the model on downstream tasks.
  • A demo running on mobile phones.
  • Explore retrieval-augmentation.

Star History

Star History Chart

Acknowledgements

这个仓库基于出色的开源项目lit-gptflash-attention构建.

@online{lit-gpt,
  author    = {Lightning AI},
  title     = {Lit-GPT},
  url       = {https://github.com/Lightning-AI/lit-gpt},
  year      = {2023},
}
@article{dao2023flashattention2,
  title     ={Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning},
  author    ={Dao, Tri},
  year      ={2023}
}

Citation

此项目目前由Peiyuan ZhangGuangtao ZengTianduo WangWei Lu贡献。

如果您觉得我们的工作有价值, 可以引用:

@misc{zhang2024tinyllama,
      title={TinyLlama: An Open-Source Small Language Model}, 
      author={Peiyuan Zhang and Guangtao Zeng and Tianduo Wang and Wei Lu},
      year={2024},
      eprint={2401.02385},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [2023] Lightning AI Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

TinyLlama 项目旨在在 3 万亿 tokens 上进行预训练,构建一个拥有11亿参数的Llama模型 展开 收起
Python
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mirrors/tinyllama-1-1b.git
git@gitee.com:mirrors/tinyllama-1-1b.git
mirrors
tinyllama-1-1b
tinyllama-1-1b
main

搜索帮助

344bd9b3 5694891 D2dac590 5694891