# RoboBrainOpen
**Repository Path**: flagopen/robo-brain-open
## Basic Information
- **Project Name**: RoboBrainOpen
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-27
- **Last Updated**: 2025-03-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
  ⭐️ Project   |   🤗 Hugging Face   |   🤖 ModelScope   |   📑 Paper   |   💬 WeChat
Recent advancements in Multimodal Large Language Models (MLLMs) have shown remarkable capabilities across various multimodal contexts. However, their application in robotic scenarios, particularly for long-horizon manipulation tasks, reveals significant limitations. These limitations arise from the current MLLMs lacking three essential robotic brain capabilities: **(1) Planning Capability**, which involves decomposing complex manipulation instructions into manageable sub-tasks; **(2) Affordance Perception**, the ability to recognize and interpret the affordances of interactive objects; and **(3) Trajectory Prediction**, the foresight to anticipate the complete manipulation trajectory necessary for successful execution. To enhance the robotic brain's core capabilities from abstract to concrete, we introduce ShareRobot, a high-quality heterogeneous dataset that labels multi-dimensional information such as task planning, object affordance, and end-effector trajectory. ShareRobot's diversity and accuracy have been meticulously refined by three human annotators. Building on this dataset, we developed RoboBrain, an MLLM-based model that combines robotic and general multi-modal data, utilizes a multi-stage training strategy, and incorporates long videos and high-resolution images to improve its robotic manipulation capabilities. Extensive experiments demonstrate that RoboBrain achieves state-of-the-art performance across various robotic tasks, highlighting its potential to advance robotic brain capabilities.
## 🚀 Features
This repository supports:
- **`Data Preparation`**: Please refer to [Dataset Preparation](#Dataset) for how to prepare the dataset.
- **`Training for RoboBrain`**: Please refer to [Training Section](#Training) for the usage of training scripts.
- **`Evaluation for RoboBrain`**: Please refer to [Evaluation Section](#Evaluation) for how to prepare the benchmarks.
- **`Support VLLM Inference`**: Please see [Inference Section](#Inference), now we support inference with [VLLM](https://github.com/vllm-project/vllm).
- **`ShareRobot Generation`**: Please refer to [ShareRobot](https://github.com/FlagOpen/ShareRobot) for details.
## 🗞️ News
- **`2025-03-26`**: 🔥 We have released the [RoboBrain](https://superrobobrain.github.io/) repository.
- **`2025-02-27`**: 🌍 Our [RoboBrain](https://superrobobrain.github.io/) was accepted to CVPR2025.
## 🤖 Models
- **[`Base Planning Model`](https://superrobobrain.github.io/)**: The model was trained on general datasets in Stages 1–2 and on the Robotic Planning dataset in Stage 3, which is designed for Planning prediction.
- **[`A-LoRA for Affordance`](https://superrobobrain.github.io/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Affordance dataset to predict affordance.
- **[`T-LoRA for Trajectory`](https://superrobobrain.github.io/)**: Based on the Base Planning Model, Stage 4 involves LoRA-based training with our Trajectory dataset to predict trajectory.
| Models | Checkpoint | Description |
|----------|----------------|----------------|
| Base Planning Model | [Planning Checkpoint](https://superrobobrain.github.io/) | Used for Planning prediction in our paper |
| A-LoRA for Affordance | [Affordance Checkpoint](https://superrobobrain.github.io/) | Used for Affordance prediction in our paper |
| T-LoRA for Trajectory | [Trajectory Checkpoint](https://superrobobrain.github.io/) | Used for Trajectory prediction in our paper |
## 🛠️ Setup
```bash
conda create -n robobrain python=3.10
conda activate robobrain
pip install -r requirements.txt
```
## 🤖 Training
### 1. Data Preparation
```bash
datasets:
- yaml_path: /path/to/stage_1.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_1_5.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_2_si.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_2_ov.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_3_planning.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_4_affordance.yaml
- json_path: xxx.json
- json_path: xxx.json
- yaml_path: /path/to/stage_4_trajectory.yaml
- json_path: xxx.json
- json_path: xxx.json
```
### 2. Training
```bash
# Training on Stage 1:
bash scripts/train/stage_1_0_pretrain.sh
# Training on Stage 1.5:
bash scripts/train/stage_1_5_direct_finetune.sh
# Training on Stage 2_si:
bash scripts/train/stage_2_0_resume_finetune_si.sh
# Training on Stage 2_ov:
bash scripts/train/stage_2_0_resume_finetune_ov.sh
# Training on Stage 3_plan:
bash scripts/train/stage_3_0_resume_finetune_robo.sh
# Training on Stage 4_aff:
bash scripts/train/stage_4_0_resume_finetune_lora_a.sh
# Training on Stage 4_traj:
bash scripts/train/stage_4_0_resume_finetune_lora_t.sh
```
## 🤖 Evaluation
### 1. Data Preparation
```bash
```
### 2. Evaluation for Robotic Benchmarks
```bash
```
### 3. Evaluation for General Benchmarks
```bash
```
## 🤖 Inference
### Option 1: HF inference
```bash
```
### Option 2: VLLM inference
```bash
```
## 📑 Citation
If you find this project useful, welcome to cite us.
```bib
@article{ji2025robobrain,
title={RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete},
author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
journal={arXiv preprint arXiv:2502.21257},
year={2025}
}
```