# MM-RL

**Repository Path**: sunnylee219/MM-RL

## Basic Information

- **Project Name**: MM-RL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 8
- **Created**: 2025-05-19
- **Last Updated**: 2025-05-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 环境安装
## torch、torch_npu安装

torch安装2.5.1版本即可
torch_npu安装release v0.7.0的2.5.1版本
https://gitee.com/ascend/pytorch/releases

## CANN包安装
获取推荐版本的CANN包(8.1.RC1)，执行如下命令进行安装：

```shell
bash Ascend-cann-toolkit_*.run --install-path=/path/to/install --full 
source /path/to/install/ascend-toolkit/set_env.sh
bash Ascend-cann-kernels_*.run --install-path=/path/to/install --install
bash Ascend-cann-nnal_*.run --install-path=/path/to/install --install
```

## 安装MindSpeed-MM 
git clone MindSpeed-MM仓库，仓库路径https://gitee.com/ascend/MindSpeed-MM

```shell
git clone https://gitee.com/ascend/MindSpeed-MM.git
```

## 安装megatron
安装Megatron仓库
执行如下命令：
```shell
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout core_r0.8.0
``` 

## 安装MindSpeed
执行如下命令：
```shell
git clone https://gitee.com/ascend/MindSpeed.git
cd MindSpeed
git checkout 6f11a6c9
pip install -r requirements.txt
pip3 install -e .
```

将MindSpeed中的dot_product_attention.py文件替换成MindSpeed-MM中的dot_product_attention.py
```shell
cp MindSpeed-MM/examples/qwen2vl/dot_product_attention.py MindSpeed/mindspeed/core/transformer/dot_product_attention.py
```

## 整合MindSpeed-RL代码文件
（1）首先git clone本仓库的代码。

（2）将Megatron中的megatron文件夹，MindSpeed中的mindspeed文件夹，本仓代码文件下的mindspeed_rl，cli以及configs文件夹复制到MindSpeed-MM文件夹下：

```
|—— checkpoint
|—— ci
|—— cli (复制自MM-RL)
|—— configs (复制自MM-RL)
|—— docs 
|—— examples
|—— megatron (复制自Megatron)
|—— mindspeed (复制自MindSpeed)
|—— mindspeed_mm
|—— mindspeed_rl（复制自MM-RL）
```

（3）将本仓库中的examples/grpo文件夹复制到MindSpeed-MM仓的examples文件夹中：
```
|—— cogvideox
|—— deepseekvl2
|—— diffusers
|—— grpo
|—— hunyuanvideo
|——internvl2
|—— ...
|—— ...
```

## 安装VLLM
执行如下命令:
```shell
git clone --depth 1 --branch v0.8.5.post1 https://github.com/vllm-project/vllm.git
cd vllm 
VLLM_TARGET_DEVICE=empty pip install . --extra-index https://download.pytorch.org/whl/cpu
```

## 安装VLLM-ASCEND
执行如下命令：
```shell
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
pip install vllm-ascend==0.8.5rc1
pip install torch_npu-2.5.1*-cp310-cp310m-linux_aarch64.whl
# vllm-ascend安装可能会覆盖torch_npu版本，需要重新安装
```

> Note:CANN和nnal以实际安装路径为准
> 在vllm_ascend的实际安装路径下的vllm_ascend/models/__init__.py中作如下修改：

```python
from vllm import ModelRegistry


def register_model():
    from .deepseek_mtp import CustomDeepSeekMTP  # noqa: F401
    from .deepseek_v2 import CustomDeepseekV2ForCausalLM  # noqa: F401
    from .deepseek_v2 import CustomDeepseekV3ForCausalLM  # noqa: F401
    from .qwen2_5_vl import \
        AscendQwen2_5_VLForConditionalGeneration  # noqa: F401
    from .qwen2_vl import AscendQwen2VLForConditionalGeneration  # noqa: F401

    ModelRegistry.register_model(
        "DeepSeekMTPModel",
        "vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP")

    ModelRegistry.register_model(
        "Qwen2VLForConditionalGeneration",
        "vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration")

    # 注释如下几行代码
    # ModelRegistry.register_model(
    #     "Qwen2_5_VLForConditionalGeneration",
    #    "vllm_ascend.models.
    # qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration"
    # )

    ModelRegistry.register_model(
        "DeepseekV2ForCausalLM",
        "vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM")

    ModelRegistry.register_model(
        "DeepseekV3ForCausalLM",
        "vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM")
```

## transformers包安装
执行如下命令：

```
git clone -b verl-fa-npu https://github.com/as12138/transformers.git
cd transformers
pip install huggingface-hub==0.30.0
python setup.py develop
```

## apex安装
参考https://gitee.com/ascend/apex进行安装，执行如下命令：
```
git clone -b master https://gitee.com/ascend/apex.git
cd apex/
bash scripts/build.sh --python=3.xx(根据自己的python版本确定)
```

## 其他依赖包安装

```shell
pip install -r requirements.txt
```
requirements.txt位于当前仓库下。


# 启动
## 基本启动
进入MindSpeed-MM目录下，以qwen2_5_vl_3b为例，执行如下命令:
```shell
bash examples/grpo/grpo_trainer_qwen25vl_3b_integrated.sh
```

注意需要将该脚本中的cann和nnal环境置换为自己安装的环境。

相应的rl配置文件位于configs/grpo_trainer_qwen25vl_3b_integrated.yaml