# RealWeb
**Repository Path**: christinexc/real-web
## Basic Information
- **Project Name**: RealWeb
- **Description**: Codes and datasets for the manuscript submitted to ICASSP 2024!
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-12-12
- **Last Updated**: 2023-12-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# RealWeb
### Introduction
Codes and datasets for the manuscript submitted to ICASSP 2024!
RealWeb is a multimodal Chinese dataset for automatic web navigation, including visual and textual annotations of real-world websites across 15 domains. It consists of 40 websites, 119 pages, and 11,739 language instructions.
### Demonstration
Here are 6 examples for universal web navigation using our multimodal framework, based on the realistic websites.
### Dataset Description
The annotation of each page in RealWeb consists of three parts: a)page screenshot, b)slot tree and c)user instructions.

### Dataset Composition
The universal web navigation is divided into 4 sub-tasks: object detection (OD), slot tree maintenance (STM), instruction parsing (IP), and web execution (WE)
For the OD task, the related dataset is stored in ./dataset/Object Detection/
For the STM task, the related dataset is stored in ./dataset/Slot Tree Maintenance/ocr_glm_dataset.jsonl. The completely slot tree of each page is stored in ./dataset/Slot Tree Maintenance/slot_tree.json
For the IP task, the related dataset is stored in ./dataset/Instruction Parsing/Instructions.json
For the WE task, the related dataset is stored in ./dataset/Web Execution/realweb.json
The mapping between the IDs in the dataset and the actual webpage URLs is stored in ./dataset/id2url-mapping.json
The realweb.json is our dataset for universal web navigation under real-world settings where only the page's urls and user instructions are known.
Other files in ./dataset are datasets for auxiliary tasks that set up to complete the universal navigation task.
All these public datasets are complete and not proportionally split into training and test sets.
### Multimodal Framework
Our framework comprises four modules: object detection, slot tree maintenance, instruction parsing, and web execution.

The source codes of our framework are stored in ./codes
The parameter settings of the object detection module is shown in the table:
| model| epochs | batch | imgsz | lr0 | lrf | momentum | weight_decay | $\lambda$ | box | cls | dfl |
| :----: | :----:| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| yolov8n | 100 | 16 | 640 | 0.01 | 0.01 | 0.937 | 0.0005 | 0.5 | 7.5 | 0.5 | 1.5 |
The parameter settings of the slot tree maintenance module is shown in the table:
| model| finetune method | lora_rank | batch_size | max_steps | learning_rate |
| :----: | :----:| :----: | :----: | :----: |:----: |
| ChatGLM-6B | LORA | 8 | 6 | 1000 | 1e-4 |
The parameter settings of the instruction parsing module is shown in the table:
| model| finetune method | lora_rank | batch_size | max_steps | learning_rate |
| :----: | :----:| :----: | :----: | :----: |:----: |
| ChatGLM-6B | LORA | 8 | 6 | 50000 | 1e-4 |
The parameter settings of the object web execution is shown in the table:
| interval time per action | key context window | value context window |
| :----: | :----:| :----: |
| 4s | 5 | 3 |