# FAST **Repository Path**: miss-lover/FAST ## Basic Information - **Project Name**: FAST - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-07 - **Last Updated**: 2026-01-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

[NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Visual-Language Model Reasoning

## Overview This repository contains the official implementation of **FAST-GRPO** (Fast-Slow Thinking Group Relative Policy Optimization), achieving high performance in applying fast-slow thinking to both visual and textual reasoning. ## Table of Contents - [Installation](#installation) - [Quick Start](#quick-start) - [Core Components](#core-components) - [Training](#training) - [Model Zoo](#model-zoo) - [Citation](#citation) ## Installation ### Setup Environment ```bash # Clone the repository git clone https://github.com/Mr-Loevan/FAST-GRPO.git cd FAST-GRPO # Create conda environment conda create -n fast_grpo python=3.11 conda activate fast_grpo # Install dependencies (Refer to EasyR1 installation) pip install -r requirements.txt pip install -e . ``` ## Quick Start ```bash # Run training with default configuration bash examples/train_fast_llm.sh ``` ## Core Components FAST-GRPO introduces three key innovations that work together to achieve fast-slow reasoning:

1. Thinking Reward Function

The Thinking Reward Function (`examples/reward_function/thinking_reward.py`) implements an adaptive difficulty-aware reward mechanism: - **Adaptive Difficulty**: `difficulty = (1 - pass_rate) * normalized_complexity` - **Differentiated Rewards**: - Easy problems (< 80th percentile) and correct answer: Rewards concise solutions - Hard problems (> 80th percentile) and incorrect answer: Rewards exploration efforts

2. Dynamic KL Penalty

Implements group-based adaptive KL divergence control for stable training: ```yaml # Configuration in config.yaml algorithm: kl_penalty: low_var_kl kl_coef: 1.0e-2 kl_type: "group_accuracy_based" kl_min_coef: 0.001 # β_min kl_max_coef: 0.01 # β_max ``` - **Group-based Adaptation**: Adjusts KL coefficient based on group performance

3. Slow2Fast Sampling

Progressive curriculum learning that gradually increases training difficulty: ```yaml # Configuration in config.yaml algorithm: online_filtering: true filter_key: accuracy dynamic_filter_schedule: - epoch_ratio: 0.5 filter_low: 0.3 filter_high: 0.99 - epoch_ratio: 1.0 filter_low: 0.01 filter_high: 0.7 ``` - **Phase 1 (0-50%)**: Learn from medium-to-high difficulty samples for slow thinking - **Phase 2 (50-100%)**: Include easy samples for fast-thinking

## Training ### Run Training Example ```bash # Use provided script (recommended) bash examples/train_fast_llm.sh ``` ## Model Zoo | Model | Base Model | Download | |-------|------------|----------| | FAST-1.5B | DeepSeek-R1-Distill-Qwen-1.5B | [ModelScope](https://modelscope.cn/models/ruiruiL/FAST_DS_1_5b) | | FAST-3B | Qwen-2.5-VL-3B | [ModelScope](https://modelscope.cn/models/xiaowenyi/FAST-3B) | | FAST-7B | Qwen-2.5-VL-7B | Coming Soon | | FAST-4B | Qwen-3-VL-4B | Coming Soon | ## Evaluation Results ### Performance on Reasoning Benchmarks | Method | GSM8K (Acc) | GSM8K (Length) | MATH 500 (Acc) | MATH 500 (Length) | AIME 2024 (Acc) | AIME 2024 (Length) | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | **FAST-1.5B** | **86.8** | **851** | **85.8** | **2645** | **34.17** | **8003** | > **Note:** Length denotes the number of generated tokens. ## Citation If you find this work useful, please cite our paper: ```bibtex @inproceedings{xiao2025fastslow, title={Fast-Slow Thinking {GRPO} for Large Vision-Language Model Reasoning}, author={Wenyi Xiao and Leilei Gan}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=MI1uT5rReV} } ``` ## License This project is licensed under the Apache 2.0 License. ## Acknowledgments - The results reported in our paper were originally implemented with [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) - This repository provides a reimplementation using [EasyR1](https://github.com/hiyouga/EasyR1) framework - Thanks to the [VeRL](https://github.com/volcengine/verl) and [EasyR1](https://github.com/hiyouga/EasyR1) team for the base training framework.