# DeepSeek-R1-TrainingSuite **Repository Path**: gapyanpeng/DeepSeek-R1-TrainingSuite ## Basic Information - **Project Name**: DeepSeek-R1-TrainingSuite - **Description**: https://github.com/mkantwala/DeepSeek-R1-TrainingSuite.git - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-02 - **Last Updated**: 2025-02-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DeepSeek-R1 Implementation ๐Ÿง โž— [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/) [![PyTorch 2.0+](https://img.shields.io/badge/pytorch-2.0+-red.svg)](https://pytorch.org/) [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-HuggingFace-yellow)](https://huggingface.co/) Advanced implementation of DeepSeek-R1 mathematical reasoning model with Group Relative Policy Optimization (GRPO). ## Key Features โœจ - **GRPO Training**: Novel group-based RL training approach - **Multi-modal Rewards**: Combined format + accuracy rewards - Mathematical verification - Secure code execution - **Safety Distillation**: Knowledge transfer with safety constraints - **LoRA Support**: Efficient parameter fine-tuning - **Distributed Training**: Accelerate integration for multi-GPU ## Installation โš™๏ธ ```bash git clone https://github.com/mkantwala/DeepSeek-R1-TrainingSuite.git cd deepseek-r1 pip install -r requirements.txt ``` ## Quick Start ๐Ÿš€ ### Training Configuration ```yaml # configs/base_config.yaml model_config: base_model: "deepseek-ai/deepseek-math-7b-base" max_length: 2048 lora: r: 8 lora_alpha: 32 target_modules: ["q_proj", "v_proj"] training_params: epochs: 1000 batch_size: 16 learning_rate: 2e-5 group_size: 4 ``` ### Start Training ```bash accelerate launch scripts/train.py \ --config configs/base_config.yaml \ --dataset_path math_dataset ``` ### Distillation ```bash python scripts/distill.py \ --teacher_model trained_teacher \ --student_model deepseek-ai/deepseek-math-7b-base \ --dataset math_dataset ``` ## Project Structure ๐Ÿ“‚ ```bash deepseek-r1/ โ”œโ”€โ”€ configs/ # Training configurations โ”œโ”€โ”€ data/ # Data processing modules โ”œโ”€โ”€ training/ # Core training logic โ”œโ”€โ”€ reward/ # Reward calculation system โ”œโ”€โ”€ distillation/ # Safety distillation โ”œโ”€โ”€ models/ # Model architectures โ”œโ”€โ”€ scripts/ # Operational scripts โ”œโ”€โ”€ tests/ # Unit tests โ””โ”€โ”€ docs/ # Documentation ``` ## Advanced Features ๐Ÿ”ฅ ### Custom Reward Components Implement custom reward functions: ```python from reward.reward_calculator import BaseRewardCalculator class CustomReward(BaseRewardCalculator): def calculate_reward(self, response, ground_truth): # Implement custom logic return {"total": custom_score, ...} ``` ### Multi-GPU Training Utilize Accelerate for distributed training: ```bash accelerate config # Set up distributed environment accelerate launch scripts/train.py ``` ### LoRA Configuration Edit `configs/base_config.yaml` to modify LoRA parameters: ```yaml lora: r: 16 lora_alpha: 64 target_modules: ["q_proj", "v_proj", "output_proj"] bias: "none" ``` ## Testing ๐Ÿงช Run comprehensive test suite: ```bash pytest tests/ -v ``` ## Contribution ๐Ÿค Contributions welcome! Please follow: 1. Fork the repository 2. Create your feature branch 3. Submit a pull request