# human_assisted_preference_optimization

**Repository Path**: ByteDance/human_assisted_preference_optimization

## Basic Information

- **Project Name**: human_assisted_preference_optimization
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-12
- **Last Updated**: 2025-09-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


# [NeurIPS 2025] Human-assisted Action Policy Optimization

[Project Page](https://xwinks.github.io/human_assisted_preference_optimization/) 

[Arxiv](https://arxiv.org/abs/2506.07127)

![HAPO](./pipeline.png)


## Introduction

In this work, we propose a human-assisted action policy optimization (HAPO) method to correct interaction failure and achieve stable optimization for Vision-Language-Action (VLA) models.

## Installation

To install the dependencies for training, run the following command:
```bash
pip install -r requirements.txt
```

To install the dependencies for inference, please install the following packages:

### MimicGen Installation
```
mkdir deps
cd ${project_path}/deps
git clone https://github.com/NVlabs/mimicgen.git
cd mimicgen
pip install -e .
```
### Robosuite Installation
```
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite.git
cd robosuite
git checkout b9d8d3de5e3dfd1724f4a0e6555246c460407daa
pip install -e .
```

### Robomimic Installation
```
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robomimic.git
cd robomimic
git checkout d0b37cf214bd24fb590d182edb6384333f67b661
pip install -e .
```
### Robosuite_task_zoo Installation
```
cd ${project_path}/deps
git clone https://github.com/ARISE-Initiative/robosuite-task-zoo
cd robosuite-task-zoo
git checkout 74eab7f88214c21ca1ae8617c2b2f8d19718a9ed
pip install -e .
```

### Flash-Attn Installation
```
pip install packaging ninja
ninja --version; echo $?
pip install "flash-attn==2.5.5" --no-build-isolation
```
## Training

To train the HAPO model, run the following command:
```bash
bash scripts/hapo_train.sh ${task_name}
```

In this work, we evaluate the performance of the HAPO model on the MimicGen dataset.

## Inference

To evaluate the performance of the HAPO model, run the following command:
```bash
bash scripts/inference.sh ${task_name} ${adapter_path}
```

## Citation

```bibtex
@article{xia2025robotic,
  title={Robotic Policy Learning via Human-assisted Action Preference Optimization},
  author={Xia, Wenke and Yang, Yichu and Wu, Hongtao and Ma, Xiao and Kong, Tao and Hu, Di},
  journal={arXiv preprint arXiv:2506.07127},
  year={2025}
}
```