# openr **Repository Path**: liao_1995/openr ## Basic Information - **Project Name**: openr - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: 15-small-bugs-about-string-post-processing-in-rmremotecaller - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-18 - **Last Updated**: 2024-10-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Logo

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper · Tutorial · Code · Docs · Data · Model · Issue

--- [![GitHub contributors](https://img.shields.io/github/contributors/openreasoner/openr)][contributors-url] [![arXiv](https://img.shields.io/badge/ArXiv-2410.09671-b31b1b.svg)](https://arxiv.org/pdf/2410.09671) ![GitHub License](https://img.shields.io/github/license/openreasoner/openr) [![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/openreasoner/openr)][issues-url] [![GitHub forks](https://img.shields.io/github/forks/openreasoner/openr)][forks-url] [![GitHub Repo stars](https://img.shields.io/github/stars/openreasoner/openr)][stars-url] [![HuggingFace Dataset](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](https://huggingface.co/openreasoner) [![X](https://img.shields.io/badge/openreasoner-%23000000.svg?logo=X&logoColor=white)](https://x.com/openreasoner) [![WeChat](https://img.shields.io/badge/WeChat_Group-07C160?logo=wechat&logoColor=white)](#community) [//]: # (
) Table of Contents
  1. News and Updates
  2. Features
  3. Plots
  4. Datasets and Models
  5. Getting Started
  6. Usage
  7. Contact
  8. License
  9. Response Examples
  10. Community
  11. Reference
[//]: # (
) ## News and Updates - **[15/10/2024]** Our report is on [**Arxiv**](https://arxiv.org/abs/2410.09671)! - **[12/10/2024]** ***OpenR*** has been released! 🚀 ## Features

Description

## Plots

PRM_Results Inference_Results

## Provided Datasets and Models [//]: # ([PRM800K](https://github.com/openai/prm800k) (Process Supervision Dataset)) [MATH-APS](https://huggingface.co/datasets/mengfang/MATH-APS) (Our Dataset) [MATH-psa](https://huggingface.co/openreasoner/Math-psa) (Our Process Reward Model) ## Getting Started ### Installation ``` conda create -n open_reasoner python=3.10 conda activate open_reasoner pip install -r requirements.txt pip3 install "fschat[model_worker,webui]" pip install -U pydantic cd envs/MATH/latex2sympy pip install -e . cd - ``` ### Download Base Models Before running the project, please ensure that all required base models are downloaded. The models used in this project include: - `Qwen2.5-Math-1.5B-Instruct`, `Qwen2.5-Math-7B-Instruct` - `Qwen2.5-Math-RM-72B` - `peiyi9979/mistral-7b-sft` - `peiyi9979/math-shepherd-mistral-7b-prm` To download these models, please refer to the [Hugging Face model downloading tutorial](https://huggingface.co/docs/hub/models-downloading) for step-by-step guidance on downloading models from the Hugging Face Hub. Please make sure that all models are saved in their directories according to the project setup before proceeding. ### Quickstart Before running inference, please modify the following variables in the scripts under `reason/llm_service/` to set the appropriate base models for your usage: - `$MODEL_BASE`: Set this to the directory where your models are stored. - `$POLICY_MODEL_NAME`: Set this to the name of the policy model you wish to use. - `$VALUE_MODEL_NAME`: Set this to the name of the value model you wish to use. - `$NUM_LM_WORKER`: Set this to the number of language model (LM) workers to start. - `$NUM_RM_WORKER`: Set this to the number of reward model (RM) workers to start. Then it prepares and runs inference using different techniques. #### Start LM & RM Services For example, to start the LM and RM services for the Math Shepherd model, run the following command: ```bash sh reason/llm_service/create_service_math_shepherd.sh ``` ## Usage #### Run Inference ⚠️ Make sure the input (`--LM`, `--RM`) in the script aligns with the variables (`$POLICY_MODEL_NAME`, `$VALUE_MODEL_NAME`) in the pending worker! ```bash export PYTHONPATH=$(pwd) sh scripts/eval/cot_greedy.sh # Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},) sh scripts/eval/cot_rerank.sh # Method: best_of_n. Average result: ({'majority_vote': 0.782, # 'prm_min_max': 0.772, # 'prm_min_vote': 0.792, # 'prm_last_max': 0.776, # 'prm_last_vote': 0.792, # 'total_completion_tokens': 4431.268},) sh scripts/eval/beam_search.sh # Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},) ``` #### Run Training ⚠️ Before training, please modify the `$dataset_path`, `$model_name_or_path` and `$prm_name_or_path` in `train/mat/scripts/train_llm.sh`. ```bash cd train/mat/scripts bash train_llm.sh ``` #### Run PRM Learning ```bash cd prm/code \\ single gpu python finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \ --train_data_path $TRAIN_DATA_PATH \ --test_data_path $TEST_DATA_PATH \\ multi gpu torchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \ --data_path $YOUR_DATA_FOLDER_PATH \ --datasets both \ ``` ## Future Plan - Add More Comprehensive Evaluations on RL Training and Search Strategies - Scaling the Prove-Verifier Model Size - Support Self-improvement Training ## Contact The OpenR community is maintained by: - **Openreasoner Team** (openreasoner@gmail.com) ## License OpenR is released under the MIT License. ## Citation If you do find our resources helpful, please cite our paper: ``` @article{openr2024, title = {OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models}, url = {https://arxiv.org/pdf/2410.09671}, author = {Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang}, year = {2024} } ``` ## Response Examples ### Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

QA 1 QA 2

### Justifing RL Training

QA 3 QA 4

### Exploring Test-time Computation

QA 5 QA 6 QA 7

## Community **WeChat**: ## Reference ### Inference-time Computing [1] [Alphazero-like tree-search can guide large language model decoding and training.](https://arxiv.org/pdf/2309.17179) [2] [Reasoning with language model is planning with world model.](https://arxiv.org/pdf/2305.14992) [3] [Scaling LLM test-time compute optimally can be more effective than scaling model parameters](https://arxiv.org/pdf/2408.03314?) [4] [Think before you speak: Training language models with pause tokens](https://arxiv.org/pdf/2310.02226) ### From Outcome Supervision to Process Supervision [1] [Training verifiers to solve math word problems](https://arxiv.org/pdf/2110.14168) [2] [Solving math word problems with process-and outcome-based feedback](https://arxiv.org/pdf/2211.14275) [3] [Let’s verify step by step](https://arxiv.org/pdf/2305.20050) [4] [Making large language models better reasoners with step-aware verifier](https://arxiv.org/pdf/2206.02336) [5] [Ovm, outcome-supervised value models for planning in mathematical reasoning](https://aclanthology.org/2024.findings-naacl.55.pdf) [6] [Generative verifiers: Reward modeling as next-token prediction](https://arxiv.org/pdf/2408.15240) ### Data Acquisition [1] [Star: Bootstrapping reasoning with reasoning](https://proceedings.neurips.cc/paper_files/paper/2022/file/639a9a172c044fbb64175b5fad42e9a5-Paper-Conference.pdf) [2] [Quiet-star: Language models can teach themselves to think before speaking](https://arxiv.org/pdf/2403.09629) [3] [Improve mathematical reasoning in language models by automated process supervision](https://arxiv.org/pdf/2406.06592) [4] [Shepherd: A critic for language model generation](https://arxiv.org/abs/2308.04592) [5] [Math-shepherd: Verify and reinforce llms step-by-step without human annotations](https://aclanthology.org/2024.acl-long.510.pdf) [contributors-shield]: https://img.shields.io/github/contributors/openreasoner/openr.svg?style=for-the-badge [contributors-url]: https://github.com/openreasoner/openr/graphs/contributors [forks-shield]: https://img.shields.io/github/forks/openreasoner/openr.svg?style=for-the-badge [forks-url]: https://github.com/openreasoner/openr/network/members [stars-shield]: https://img.shields.io/github/stars/openreasoner/openr.svg?style=for-the-badge [stars-url]: https://github.com/openreasoner/openr/stargazers [issues-shield]: https://img.shields.io/github/issues/openreasoner/openr.svg?style=for-the-badge [issues-url]: https://github.com/openreasoner/openr/issues [license-shield]: https://img.shields.io/github/license/openreasoner/openr.svg?style=for-the-badge [license-url]: https://github.com/openreasoner/openr/blob/main/LICENSE.txt