1 Star 0 Fork 0

cs_holder/open-rs

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

Open RS

This repository hosts the code and datasets for the Open RS project, accompanying the paper Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t. The project explores enhancing reasoning capabilities in small large language models (LLMs) using reinforcement learning (RL) under resource-constrained conditions.

We focus on a 1.5-billion-parameter model, DeepSeek-R1-Distill-Qwen-1.5B, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior. Key findings include:

  • Significant reasoning improvements, e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, outperforming o1-preview.
  • Efficient training with just 7,000 samples at a cost of $42, compared to thousands of dollars for baseline models.
  • Challenges like optimization instability and length constraints with extended training.

These results showcase RL-based fine-tuning as a cost-effective approach for small LLMs, making reasoning capabilities accessible in resource-limited settings. We open-source our code, models, and datasets to support further research.

Performance Metrics

Resources

Models

Datasets

Collection

Installation

Prerequisites

Install uv for managing virtual environments:

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

Set up a virtual environment with Python 3.11:

uv venv openr1 --python 3.11
source openr1/bin/activate
uv pip install --upgrade pip
export UV_LINK_MODE=copy

Dependencies

Install vLLM and FlashAttention:

uv pip install vllm==0.7.2
uv pip install setuptools
uv pip install flash-attn --no-build-isolation

Note: This installs PyTorch v2.5.1, which is required for vLLM compatibility. Using a different version may cause issues.

Install additional dependencies based on your use case:

GIT_LFS_SKIP_SMUDGE=1 uv pip install -e ".[dev]"

Authentication

Log in to Hugging Face and Weights & Biases:

huggingface-cli login
wandb login

Git LFS

Ensure Git LFS is installed for model/dataset management:

git-lfs --version

If not installed:

sudo apt-get install git-lfs

Training

Train models using a YAML config with 4 GPUs (set num_processes=3):

ACCELERATE_LOG_LEVEL=info accelerate launch \
  --config_file recipes/accelerate_configs/zero2.yaml \
  --num_processes=3 \
  src/open_r1/grpo.py \
  --config recipes/grpo.yaml

For Experiment 3, add the cosine_max_len parameter:

ACCELERATE_LOG_LEVEL=info accelerate launch \
  --config_file recipes/accelerate_configs/zero2.yaml \
  --num_processes=3 \
  src/open_r1/grpo.py \
  --config recipes/grpo.yaml \
  --cosine_max_len 3584

Evaluation

Evaluate models using lighteval with custom tasks in src/open_r1/evaluate.py. For single-GPU setups:

MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
OUTPUT_DIR=data/evals/$MODEL

# Example: AIME 2024
TASK=aime24
lighteval vllm "$MODEL_ARGS" "custom|$TASK|0|0" \
  --custom-tasks src/open_r1/evaluate.py \
  --use-chat-template \
  --output-dir "$OUTPUT_DIR"

Important: Set max_model_length=32768 to match max_new_tokens, or lighteval will fail.

For multi-GPU evaluation with data parallelism:

NUM_GPUS=4
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="pretrained=$MODEL,dtype=bfloat16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}"
TASK=aime24
OUTPUT_DIR=data/evals/$MODEL

lighteval vllm "$MODEL_ARGS" "custom|$TASK|0|0" \
  --custom-tasks src/open_r1/evaluate.py \
  --use-chat-template \
  --output-dir "$OUTPUT_DIR"

Alternatively, use the evaluation script:

sh eval.sh

Modify tasks in eval.sh (line 8) as needed.

Performance Highlights

  • Open-RS1: 53.0% avg. score
  • Open-RS2: 55.7% avg. score, 80.0% on AMC23
  • Open-RS3: 56.3% avg. score, 46.7% on AIME24 (outperforms o1-preview at 44.6%)
  • Competitive MATH-500 scores; Minerva lags behind 7B models.

Performance Metrics

Cost Efficiency

Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to:

  • 7B models: Qwen2.5-7B-SimpleRL ($1,633), Eurus-2-7B-PRIME ($1,088)
  • 1.5B models: DeepScaleR-1.5B-Preview ($3,629), Still-3-1.5B-Preview ($2,268)

7B Model Costs
1.5B Model Costs

Acknowledgements

Thanks to the Hugging Face team for their open-r1 project.

Citation

Coming soon.

MIT License Copyright (c) 2025 Knovel Engineering Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

暂无描述 展开 收起
README
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/cs_holder/open-rs.git
git@gitee.com:cs_holder/open-rs.git
cs_holder
open-rs
open-rs
colm

搜索帮助