# distill-bloom-deepspeed **Repository Path**: mirrors_huggingface/distill-bloom-deepspeed ## Basic Information - **Project Name**: distill-bloom-deepspeed - **Description**: Teacher - student distillation using DeepSpeed - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-10-24 - **Last Updated**: 2026-01-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # distill-bloom-deepspeed Teacher - student distillation using DeepSpeed. This repository is partially based from [BLOOM DeepSpeed repository](https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-scripts). We follow the same setup as the repository above ## Setup ```pip install transformers huggingface_hub==0.9.0``` ```pip install deepspeed>=0.7.3``` ## Install teacher checkpoints Install the DeepSpeed teacher checkpoints from [here]() to perform fast loading as described [here](https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-scripts#run). Download them locally and follow the instructions below to run the training. ### Teacher inference We highly recommend to install the teacher and student weights locally, therefore to not have to re-install the weights again. After installing the teacher weights, run the following command to perform inference on the teacher model. ``` deepspeed --num_gpus NUM_GPUS teacher-inference-script.py --teacher-model-path[PATH_TO_BLOOM] --train-weighted-split-paths-path [PATH_TO_DATA] --train-iters [TRAIN_ITERS] --global-batch-size [GLOBAL_BATCH_SIZE] --eval-iters [EVAL_ITERS] --seq-length [SEQ_LEN] ``` #### Processing the dataset ##### Download the dataset Here we use the dataset used to train the BLOOM model, that is available on Jean Zay. First, download the dataset that is available on a S3 bucket. The raw dataset consist of 1.6TB of numpy arrays. If you want to train our your custom dataset, please build your own dataloader structure. ##### Get the splits For now we recommend to get the splits by running the following command. ``` export DATAPATH=[PATH_TO_DATASET] git clone https://github.com/bigscience-workshop/bigscience.git cd bigscience/ python data/catalogue/load_ratios_meg_ds_format.py --dataset-ratios-path ./data/catalogue/training_dataset_ratios_merged_nigercongo_v3.json --split train --output-meg-ds-ratio-file $DATAPATH/train.txt python data/catalogue/load_ratios_meg_ds_format.py --dataset-ratios-path ./data/catalogue/training_dataset_ratios_merged_nigercongo_v3.json --split val --output-meg-ds-ratio-file $DATAPATH/val.txt ``` ##### Test the data loading script ``` deepspeed --num_gpus 8 test.py --train-weighted-split-paths-path $DATAPATH/train.txt --train-iters 200 --global-batch-size 64 --eval-iters 20 --seq-length 2048 ``` This test should output the lenght of the combined dataset as well as the total number of epochs. #### Training One the dataset is ready, we can start training the student model. ## Roadmap - [ ] Add support for teacher inference - [ ] Add support for student inference - [ ] Add support for communicating teacher logits to student node - [ ] Add support for student training (Ds-Zero) - [ ] Add support for distributed training (`hostfile`) - [x] Add support for loading Jean-Zay dataset - [ ] Add support for loading custom dataset