# ModelLink-0712-weight-mcore-2 **Repository Path**: liuyanghan/ModelLink-0712-weight-mcore-2 ## Basic Information - **Project Name**: ModelLink-0712-weight-mcore-2 - **Description**: 昇腾大模型仓库 - **Primary Language**: Python - **License**: MIT - **Default Branch**: 1.1 - **Homepage**: https://gitee.com/ascend/ModelLink/ - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1207 - **Created**: 2024-07-19 - **Last Updated**: 2024-07-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

GitHub Documentation

简体中文 | English

ModelLink aims to provide end-to-end large language model solutions for Huawei Ascend chips, including models, algorithms, and downstream tasks. ## ModelLink Solution Overview --- ### Supported features Current ModelLink supported features for large model usage: * [Dataset Preparation for Pre-training](#jump11)/[Fine-tuning Instruction Dataset Preparation](#jump12) * [Pre-training](#jump13)/[Full-parameter Fine-tuning](#jump14)/[Low-parameter Fine-tuning](#jump15) * [Inference: human-machine dialogue](#jump16) * [Evaluation with numerous benchmarks](#jump17) * [Utilizing Acceleration Features (Acceleration Algorithms + Fusion Operators)](#jump18) * [Profiling data based on Ascend chips](#jump19) * [Convert ckpt between huggingface and megatron](#jump19) * [Enbale deterministic computing function for Ascend](#jump21) More novel and useful features are developing for LLMs training on Ascend ... ### Supported Models Current ModelLink supports pre-training and fine-tuning for the following models:
Model Parameters Sequence length Pretrain Inference LoRA SFT Chat Evaluation Contributor
Aquila 7B 2K pretrain generate -- -- -- eval 【Ascend】
Aquila2 7B 2K pretrain generate -- -- -- eval 【Community】
34B 4K pretrain generate -- -- -- eval 【Community】
Baichuan 7B 4K pretrain generate -- -- -- eval 【Ascend】
13B 4K pretrain generate -- -- -- eval 【Ascend】
Baichuan2 7B 4K pretrain generate -- -- -- eval 【Ascend】
13B 4K pretrain generate -- -- -- eval 【Ascend】
Bloom 7B1 2K pretrain generate -- -- -- eval 【Ascend】
176B 2K pretrain generate -- -- -- eval 【Ascend】
ChatGLM3 6B 8K pretrain generate -- -- -- eval 【Community】
CodeLlama 34B 4K pretrain generate -- -- -- eval 【Community】
InternLM 7B 2K pretrain generate -- -- -- eval 【Ascend】
65B 2K pretrain -- -- -- -- -- 【Ascend】
LLaMA 7B 2K pretrain generate lora -- -- eval 【Ascend】
13B 2K pretrain generate lora -- -- eval 【Ascend】
33B 2K pretrain generate lora -- -- eval 【Ascend】
65B 2K pretrain generate lora -- -- eval 【Ascend】
LLaMA2 7B 4K pretrain generate lora -- -- eval 【Ascend】
13B 4K pretrain generate lora -- -- eval 【Ascend】
34B 4K pretrain generate lora -- -- eval 【Ascend】
70B 4K pretrain generate lora -- -- eval 【Ascend】
LLaMA3 8B 8K pretrain generate -- -- chat eval 【Ascend】
70B 8K pretrain generate -- -- -- eval 【Ascend】
Qwen 7B 8K pretrain generate -- -- -- eval 【Ascend】
14B 2K pretrain generate -- -- -- eval 【Ascend】
72B 8K pretrain generate -- -- -- eval 【Ascend】
Qwen1.5 0.5B 8K pretrain generate -- -- -- eval 【Community】
1.8B 8K pretrain generate -- -- -- eval 【Community】
4B 8K pretrain generate -- -- -- eval 【Community】
7B 8K pretrain generate -- -- -- eval 【Community】
14B 8K pretrain generate -- -- -- eval 【Community】
32B 8K pretrain generate lora -- -- eval 【Community】
72B 8K pretrain generate lora -- -- eval 【Ascend】
Yi 34B 4K pretrain generate -- -- -- eval 【Community】
Mixtral 8x7B 32K pretrain generate -- -- -- eval 【Ascend】
Mistral 7B 32K pretrain generate -- -- -- eval 【Ascend】
Gemma 2B 8K pretrain generate -- -- -- eval 【Ascend】
7B 8K pretrain generate lora -- -- eval 【Ascend】
GPT3 175B 2K pretrain -- -- -- -- -- 【Community】
### Script Naming Rules | Script | Rule | |:-----------------:|:-------------------:| | pretrain_xxx.sh | Pre-training Script | | tune_xxx.sh | Fine-tuning Script | | generate_xxx.sh | Inference Script | | xxx_chat_xxx.sh | Chat Script | | evaluation_xxx.sh | Evaluation Script | --- # Model Usage Guide and Version Notes Model Usage Guide and Version Notes For the supported models listed above, we provide training scripts and readme instructions in the examples folder, which contain detailed processes for model training, inference, and evaluation. 【Please note the corresponding environment versions for model usage, as follows】 | Software | [Version](https://www.hiascend.com/zh/) | | :-----------------------: |:----------------------------------:| | Python | 3.8 | | driver | Ascend HDK 24.1.RC2 | | firmware | Ascend HDK 24.1.RC2 | | CANN | CANN 8.0.RC2 | | torch | 2.1.0、2.2.0 | | torch_npu | release v6.0.RC2 | 【Based on the current version of megatron, the performance statistics from our testing are as follows (Hardware info:Atlas 900 A2 PODc)】
Model Parameters Sequence length Cluster Scale Precision Mode Performance Reference Performance
Aquila 7B 2K 1x8 BF16 2849 2874
Aquila2 7B 2K 1x8 FP16 3323 2673
34B 4K 2x8 BF16 854 732
Baichuan 7B 4K 1x8 FP16 2685 2036
13B 4K 1x8 FP16 1213 862
Baichuan2 7B 4K 1x8 BF16 2664 3969
13B 4K 1x8 BF16 1668 2062
Bloom 7B1 2K 1x8 FP16 2034 2525
176B 2K 12x8 BF16 100 107
ChatGLM3 6B 8K 1x8 FP16 4297 4267
CodeLlama 34B 4K 2x8 BF16 837 762
InternLM 7B 2K 1x8 BF16 2776 2854
65B 2K 4x8 BF16 341 414
LLaMA 7B 2K 1x8 FP16 3600 3804
13B 2K 1x8 FP16 1895 2012
33B 2K 4x8 FP16 621 776
65B 2K 4x8
BF16 348 426
LLaMA2 7B 4K 1x8 BF16 4200 3850
13B 4K 1x8 BF16 1990 1920
34B 4K 2x8 BF16 749 796
70B 4K 4x8 BF16 420 430
LLaMA3 8B 8K 1x8 BF16 2483 2674
70B 8K 8x8 BF16 283 355
Qwen 7B 8K 1x8 BF16 2499 2867
14B 2K 1x8 BF16 1560 1578
72B 8K 16x8 BF16 285 345
Qwen1.5 0.5B 8K 1x8 BF16 22834 25306
1.8B 8K 1x8 BF16 13029 12181
4B 8K 1x8 BF16 5033 5328
7B 8K 1x8 BF16 2862 2621
14B 8K 1x8 BF16 1717 1702
32B 8K 4x8 BF16 751 708
72B 8K 8x8 BF16 301 317
Yi 34B 4K 2x8 BF16 809 730
Mixtral 8x7B 32K 2x8 BF16 487 610
Mistral 7B 32K 1x8 BF16 2806 2734
Gemma 2B 8K 1x8 BF16 6821 7602
7B 8K 1x8 BF16 2938 2607
GPT3 175B 2K 16x8 FP16 153 --
--- ### Acceleration Features ModelLink supports various acceleration algorithms such as tensor parallelism, pipeline parallelism, sequence parallelism, recomputation, distributed optimizer, and more. The table below shows the enable switches corresponding to each acceleration feature: | Acceleration Feature | Enable Parameter | |:------------------------------------:|:------------------------------:| | Tensor Parallel | --tensor-model-parallel-size | | Pipeline Parallel | --pipeline-model-parallel-size | | Dynamic division for PP | --num-layer-list | | Sequence Parallel | --sequence-parallel | | Recomputation | --recompute-granularity | | Distributed Optimizer | --use-distributed-optimizer | | overlap DDP allreduce | --overlap-grad-reduce | | Flash attention | --use-flash-attn | | Fused rmsnorm | --use-fused-rmsnorm | | Fused swiglu | --use-fused-swiglu | | mc2 | --use-mc2 | | Fused rotary
position embedding | --use-fused-rotary-pos-emb | | Sliding window attention | --sliding-window | ```bash torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \ --tensor-model-parallel-size ${TP} \ --pipeline-model-parallel-size ${PP} \ --num-layer-list 1,2,2,2,1 \ --sequence-parallel \ --recompute-granularity full \ --recompute-method block \ --recompute-num-layers 72 \ --use-distributed-optimizer \ --use-flash-attn \ --use-fused-rmsnorm \ --use-fused-swiglu \ --overlap-grad-reduce \ --use-fused-rotary-pos-emb \ --use-mc2 \ --sliding-window 4096 \ ... \ ... ``` ```bash Note: To enable mc2, ensure the following: 1. The environment version matches the description on the repository homepage; 2. Comment out line 283 in the validate_args_decorator function within modellink\arguments.py: #args.use_mc2 = False ``` ## Analyze profiling data based on Ascend chips Modellink supports analyze profiling data based on Ascend chips, which is useful for modelling: ```bash --profile # enable profiling --profile-step-start 5 # the start step --profile-step-end 6 # the end step --profile-ranks 0 1 2 3 4 # ranks for profiling --profile-level level2 # level0, 1, 2 for data profiling --profile-with-cpu # profiling cpu information --profile-with-stack # profile stack information --profile-with-memory # profile memory information --profile-record-shapes # profile shape information --profile-save-path ./profile_dir # path to save data ``` ## Enable deterministic computing based on Ascend chips - add choice in script ```shell --use-deter-comp ``` - add environment variable ```shell export HCCL_DETERMINISTIC=True ``` ## Acknowledgments --- ModelLink is jointly contributed by the following departments of Huawei Corporation: - Ascend Computing Product Unit - Algorithm Unit of Computing Product Unit - Research Unit of Computing Product Unit - Open Computing Kit of Computing Product Unit - General Development Department - Global Technical Service Department We appreciate every PR from community, and welcome to contribute to ModelLink. ## Appendix --- - Safety Statement: [Safety Statement](https://gitee.com/ascend/ModelLink/wikis/%E5%AE%89%E5%85%A8%E5%A3%B0%E6%98%8E)