# SimpleVLA-RL
**Repository Path**: xdjiangkai/SimpleVLA-RL
## Basic Information
- **Project Name**: SimpleVLA-RL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-16
- **Last Updated**: 2025-12-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README

## SimpleVLA-RL: Open RL Framework for Vision–Language–Action Models
[](https://arxiv.org/abs/2509.09674) [](https://github.com/PRIME-RL/SimpleVLA-RL) [](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86) [](https://x.com/stingning/status/1927770654385860804) [](figs/wechat-group.png)
**SimpleVLA-RL** is an efficient RL framework for VLA that improves long-horizon planning under data scarcity. It leverages reinforcement learning that can substantially outperforms SFT in simulation and real-world tasks, reveals a "pushcut" new-action phenomenon, and strengthens spatial/object/goal generalization.
# 🎉News
- **[2025-10-01]** **SimpleVLA-RL** now supports RoboTwin2.0 Benchmark. Feel free to experiment with it!
- **[2025-09-12]** Excited to release the **SimpleVLA-RL** paper! Check it out: [Paper](https://arxiv.org/abs/2509.09674).
- **[2025-05-27]** We release the code of **SimpleVLA-RL**.
# 📌Highlights
#### Efficient and Effective VLA Reinforcement Learning Framework
- End-to-end VLA RL pipeline built on [veRL](https://github.com/volcengine/verl) with VLA-specific optimizations
- Multi-environment parallel rendering significantly accelerates VLA trajectory sampling
- Leverages [veRL](https://github.com/volcengine/verl)'s state-of-the-art infrastructure: efficient distributed training (FSDP), hybrid communication patterns, and optimized memory management for fast training/inference
#### Model and Environment Support
- **VLA Models**: [OpenVLA](https://github.com/openvla/openvla), [OpenVLA-OFT](https://github.com/moojink/openvla-oft)
- **Benchmarks**: [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), [RoboTwin 1.0/2.0](https://github.com/TianxingChen/RoboTwin)
- Modular architecture for easy integration of new VLA models, benchmarks and RL algorithms (Upcoming)
#### Minimal Reward Engineering and Exploration Strategies
- Binary (0/1) outcome rewards - no complex reward design needed
- Exploration strategies: dynamic sampling, adaptive clipping, temperature tuning
# 🔧Key Implementations
SimpleVLA-RL extends veRL with VLA-specific components across the following modules:
**[`verl/trainer/main_ppo.py`](verl/trainer/main_ppo.py)**
- Main entry point with ray initialization
- `RobRewardManager` for reward distribution
**[`verl/trainer/ppo/ray_trainer.py`](verl/trainer/ppo/ray_trainer.py)**
- Main RL training loop: data loading, VLA rollout, model updates, evaluation, checkpointing
- RL algorithm-specific advantage computation
**[`verl/workers/fsdp_workers.py`](verl/workers/fsdp_workers.py)**
- Source of core functions called in `ray_trainer.py`
- VLA model/optimizer initialization, `generate_sequences`, `compute_entropy`, `update_actor`
**[`verl/workers/actor/dp_rob.py`](verl/workers/actor/dp_rob.py)**
- Specific implementation of functions in `fsdp_workers.py`
- RL loss computation, policy updates, `compute_log_prob`, `compute_entropy`
**[`verl/workers/rollout/rob_rollout.py`](verl/workers/rollout/rob_rollout.py)**
- VLA rollout implementation: environment creation, multi-environment parallel rendering, VLA action generation, environment interaction, video saving, trajectory and 0/1 reward collection
**[`verl/utils/dataset/rob_dataset.py`](verl/utils/dataset/rob_dataset.py)**
- Dataset construction for training/testing across benchmarks
**[`verl/utils/vla_utils/`](verl/utils/vla_utils/)**
- VLA model implementations (OpenVLA-OFT/OpenVLA from official code)
# ✨Getting Started
#### 1. Set Up the Environment
See [SETUP.md](SETUP.md) for detailed instructions on setting up the conda environment.
#### 2. Prepare the SFT Model
An **SFT (Supervised Fine-Tuning)** VLA model is required for RL training. Below are the available options:
* **OpenVLA-OFT SFT Models**
Download from the [SimpleVLA-RL Collection](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86). Available models include:
- `libero-10 traj1/trajall SFT`
- `libero-goal/object/spatial traj1 SFT`
- `Robotwin2.0 tasks traj1000 SFT`
* **OpenVLA SFT Models**
Download from [here](https://huggingface.co/openvla).
* **Other Models**
For other models, you may need to fine-tune them yourself.
#### 3. Train with SimpleVLA-RL
Before running the training script, ensure the following configurations are properly set:
- **Set Your Weights and Biases (WandB) API Key**
Replace the `WANDB_API_KEY` field in `SimpleVLA-RL/align.json` with your own WandB API key.
- **Modify Key Variables**
Update the following variables in `examples/run_openvla_oft_rl_libero/twin2.sh` as needed:
- `WANDB_API_KEY`: Your WandB API key.
- `EXPERIMENT_NAME`: The name of your experiment. You can choose any name.
- `SFT_MODEL_PATH`: Path to your SFT model.
- `CKPT_PATH`: Path where your checkpoints will be saved.
- `DATASET_NAME`: For detailed options, refer to `examples/run_openvla_oft_rl_libero/twin2.sh`.
- `ALIGN_PATH`: Path to the `SimpleVLA-RL/align.json` file.
- `NUM_GPUS`: Number of GPUs available per node (e.g., `8`).
- `NUM_NODES`: Number of nodes used for RL training (e.g., `1`).
> [!NOTE]
>
> - The script has been tested on the following configurations:
> - Single-node setup: `NUM_NODES=1`, `NUM_GPUS=8` (1 node with 8 NVIDIA A800 GPUs, each having 80GB memory).
> - Multi-node setup: `NUM_NODES=2`, `NUM_GPUS=8` (2 nodes with 16 NVIDIA A800 GPUs, each having 80GB memory).
> - The driver version used is `470.161.03`, and the CUDA version is `12.4`. *(Not necessary)*
- **Run RL Training**
Use the following command to start RL training for OpenVLA-OFT on the LIBERO or RoboTwin2.0 benchmark:
```bash
bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh
```
#### 4. Run Evaluation
To evaluate the performance of your model, enable evaluation mode by setting `trainer.val_only=True` in `examples/run_openvla_oft_rl_libero/twin2.sh`. Then, execute the same script:
```bash
bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh
```
# 📃 Main Results
We evaluate SimpleVLA-RL on the LIBERO using OpenVLA-OFT. SimpleVLA-RL improves the performance of OpenVLA-OFT to 97.6 points on LIBERO-Long and sets a new state-of-the-art. Remarkably, using only one trajectory per task for cold-start SFT, SimpleVLA-RL raises the performance of OpenVLA-OFT from 17.3 to 91.7, yielding an improvement of 74.4 points (430.1%).
# 🌻Acknowledgement
We develop this preview version of the code based on [veRL](https://github.com/volcengine/verl), [OpenVLA-OFT](https://github.com/moojink/openvla-oft), [RoboTwin2.0](https://github.com/RoboTwin-Platform/RoboTwin.git), and [PRIME](https://github.com/PRIME-RL/PRIME). We acknowledge their significant contributions!
For further details and updates, please refer to the official documentation and repositories of the respective projects.
# 📝Roadmap
### Expanding Model Support
- [ ] Support advanced diffusion based RL: [pi0](https://github.com/Physical-Intelligence/openpi) and [pi0.5](https://github.com/Physical-Intelligence/openpi) with flow matching RL
- [ ] Support more VLA models: especially for lightweight ones (e.g. [VLA-Adapter](https://github.com/OpenHelix-Team/VLA-Adapter), [SmolVLA](https://huggingface.co/docs/lerobot/smolvla))
### Expanding Environment Support
- [ ] Support more benchmarks: e.g. [SimplerEnv](https://github.com/simpler-env/SimplerEnv), [BEHAVIOR](https://github.com/behavior-robot-suite/brs-algo), [Calvin](https://github.com/mees/calvin)
- [ ] Support real-world RL.
### Expanding Framework
- [ ] Additional online RL methods and Offline RL algorithms
- [ ] Modular environment and VLA interface for easy adaptation
- [ ] Further optimize the RL framework to achieve more efficient training
# 📨Contact
- Haozhan Li: zhan72426@gmail.com
- Ning Ding: dingning@mail.tsinghua.edu.cn
# 🎈Citation
If you find SimpleVLA-RL helpful, please cite us:
```bibtex
@article{li2025simplevla,
title={SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning},
author={Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Yang, Zhaohui and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and others},
journal={arXiv preprint arXiv:2509.09674},
year={2025}
}
```
# 🌟Star History
[](https://www.star-history.com/#PRIME-RL/SimpleVLA-RL&Date)