# RealWeb **Repository Path**: christinexc/real-web ## Basic Information - **Project Name**: RealWeb - **Description**: Codes and datasets for the manuscript submitted to ICASSP 2024! - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2023-12-12 - **Last Updated**: 2023-12-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # RealWeb ### Introduction Codes and datasets for the manuscript submitted to ICASSP 2024! RealWeb is a multimodal Chinese dataset for automatic web navigation, including visual and textual annotations of real-world websites across 15 domains. It consists of 40 websites, 119 pages, and 11,739 language instructions. ### Demonstration Here are 6 examples for universal web navigation using our multimodal framework, based on the realistic websites.
demo1
demo2
demo3
demo4
demo5
demo6
### Dataset Description The annotation of each page in RealWeb consists of three parts: a)page screenshot, b)slot tree and c)user instructions. ![The sample data in RealWeb](/demos/overview.png) ### Dataset Composition The universal web navigation is divided into 4 sub-tasks: object detection (OD), slot tree maintenance (STM), instruction parsing (IP), and web execution (WE) For the OD task, the related dataset is stored in ./dataset/Object Detection/ For the STM task, the related dataset is stored in ./dataset/Slot Tree Maintenance/ocr_glm_dataset.jsonl. The completely slot tree of each page is stored in ./dataset/Slot Tree Maintenance/slot_tree.json For the IP task, the related dataset is stored in ./dataset/Instruction Parsing/Instructions.json For the WE task, the related dataset is stored in ./dataset/Web Execution/realweb.json The mapping between the IDs in the dataset and the actual webpage URLs is stored in ./dataset/id2url-mapping.json The realweb.json is our dataset for universal web navigation under real-world settings where only the page's urls and user instructions are known. Other files in ./dataset are datasets for auxiliary tasks that set up to complete the universal navigation task. All these public datasets are complete and not proportionally split into training and test sets. ### Multimodal Framework Our framework comprises four modules: object detection, slot tree maintenance, instruction parsing, and web execution. ![Our Multimodal Framework](/demos/new-framework.png) The source codes of our framework are stored in ./codes The parameter settings of the object detection module is shown in the table: | model| epochs | batch | imgsz | lr0 | lrf | momentum | weight_decay | $\lambda$ | box | cls | dfl | | :----: | :----:| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | yolov8n | 100 | 16 | 640 | 0.01 | 0.01 | 0.937 | 0.0005 | 0.5 | 7.5 | 0.5 | 1.5 | The parameter settings of the slot tree maintenance module is shown in the table: | model| finetune method | lora_rank | batch_size | max_steps | learning_rate | | :----: | :----:| :----: | :----: | :----: |:----: | | ChatGLM-6B | LORA | 8 | 6 | 1000 | 1e-4 | The parameter settings of the instruction parsing module is shown in the table: | model| finetune method | lora_rank | batch_size | max_steps | learning_rate | | :----: | :----:| :----: | :----: | :----: |:----: | | ChatGLM-6B | LORA | 8 | 6 | 50000 | 1e-4 | The parameter settings of the object web execution is shown in the table: | interval time per action | key context window | value context window | | :----: | :----:| :----: | | 4s | 5 | 3 |