# WebRL **Repository Path**: eipoz-opensource/WebRL ## Basic Information - **Project Name**: WebRL - **Description**: https://github.com/THUDM/WebRL - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-30 - **Last Updated**: 2024-12-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
📃 Paper | 🤗 WebRL-GLM-4-9B | WebRL-LLaMA-3.1-8B | ModelScope
*** WebRL, a self-evolving online curriculum learning framework designed for training web agents, targeting the WebArena environment. ## 🚀 Quick Start ### Dependencies First, create a conda environment and install all pip package requirements. ```bash conda create -n webrl python==3.10 conda activate webrl cd WebRL pip install -e . ``` ### Model checkpoints #### Actor checkpoints The WebRL-GLM-4-9B checkpoint was released here and we use it: - [WebRL-GLM-4-9B checkpoint](https://huggingface.co/THUDM/webrl-glm-4-9b) - [WebRL-Llama-3.1-8B checkpoint](https://huggingface.co/THUDM/webrl-llama-3.1-8b) - [WebRL-Llama-3.1-70B checkpoint](https://huggingface.co/THUDM/webrl-llama-3.1-70b) #### ORM checkpoint The checkpoint for Outcome-supervised Reward Model (ORM) is as follow: - [ORM-Llama-3.1-8B checkpoint](https://huggingface.co/THUDM/webrl-orm-llama-3.1-8b/tree/main) ### ✈️ Train SFT model We use LLaMA-Factory to train the SFT baseline, which is the starting model for WebRL. We release the code and data used for training. You can train the SFT baseline with the following commands: ```bash cd LLaMA-Factory bash run.sh examples/train_full/llama3_full_policy_web.yaml ``` ### ✈️ Train WebRL After training the SFT baseline, you should use it as the initial model of the actor and critic. You can train WebRL with the following commands: ```bash bash run_multinode.sh ``` This command is used to train the actor and critic in each phase. ### 💡 Generating New Instructions You can generate new instructions with the following commands: ```bash python scripts/gen_task.py ``` ### 🛜 Interaction and Evaluation The instruction and script for interaction with WebArena is provided in [VAB-WebArena-Lite](https://github.com/THUDM/VisualAgentBench/tree/main/VAB-WebArena-Lite). You can implement the interaction process of WebRL according to the [``Evaluating in WebRL Setting (Text Modal)``](https://github.com/THUDM/VisualAgentBench/tree/main/VAB-WebArena-Lite#-evaluating-in-webrl-setting-text-modal) section of VAB-WebArena-Lite. To enable interaction with WebArena, you need to configure each task in the same format as the sample test case provided in the ``test_webarena_lite.raw.json`` file in VAB-WebArena-Lite. Below is the template for a task configuration: ```python { "sites": [