# mobilevlm **Repository Path**: mirrors_XiaoMi/mobilevlm ## Basic Information - **Project Name**: mobilevlm - **Description**: MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-04 - **Last Updated**: 2026-03-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MobileVLM ![My Local Image](./mobilevlm.png) ## Paper Link [MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding](https://aclanthology.org/2024.findings-emnlp.599/) ### News - **2024.11.12** - Partial training data and random walk code for Mobile3M released! - **2024.10.4** - Test data for Mobile3M released! - **2024.9.26** - Our work accepted by EMNLP 2024 Findings! ### 1. Quick Start #### Requirements - transformers==4.32.0 - accelerate - tiktoken - einops - transformers_stream_generator==0.0.4 - scipy - torchvision - pillow - tensorboard - matplotlib ### 2. Mobile3M Dataset ![Dataset Image](./mobilevlm_table.png) #### Training Data Training data is available at the following link: [data](https://huggingface.co/datasets/xwk123/Mobile3M/tree/main). We will gradually upload data for all apps. #### Training Json Training json files are available at the following link: [training_jsons](https://huggingface.co/datasets/wuqinzhuo/MobileVLM_traindata/tree/main). #### Corpus Collection Script To start collecting data, run the script `main/corpus/googleCreatDataset/arm_graph_para_lock.py`. Example usage: ```bash python googleCreatDataset/arm_graph_para_lock.py --device_name 10.53.89.79:6532 --systemPort 8112 --appid 8201 --command_executorhttp://127.0.0.1:4812/wd/hub--appPackage com.lucky.luckyclient --name_en lucky --diff_max 0.5 --diff_png 0.3 --waitadb 8 --prefix lucky0_3_1_2_ --recheck -1 ``` Running the above collection instruction requires the following additional installations. - Install Node.js and Appium: ```bash curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt-get install -y nodejs sudo apt install npm npm install -g appium@1.18.3 ``` - Install graphical libraries: `sudo apt-get install xorg` - Activate the Python virtual environment: `source /path/to/new/virtual/environment/bin/activate` - Install Appium Python Client 1.3.0: `pip install Appium-Python-Client==1.3.0` #### Parameter Descriptions - **device_name**: Name of the emulator. - **appid**: Storage ID of the app being collected, e.g., 8201. - **command_executor**: Appium system endpoint URL. - **--diff_max 0.5 --diff_png 0.3**: Page similarity thresholds for differentiating screens. - **--prefix lucky0_3_1_2_**: Distributed starting path for data collection. - **--recheck -1**: Specifies whether to recheck previously collected data. Set to `-1` for no recheck. ### Appium ![Appium](./appium.jpg) --- ### Data Generation Code for Each Task ![Task](./taSK.png) The code for generating data for each task can be found in the following directories: ### Our Test Data Our test data is available at [data](https://huggingface.co/datasets/xwk123/mobilevlm_test). ### 4. License The dataset of this project is licensed under the [**Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)**](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. The source code of the this is licensed under the [**Apache 2.0**](http://www.apache.org/licenses/LICENSE-2.0) license. #### Summary of Terms - **Attribution**: You must give appropriate credit, provide a link to the license, and indicate if changes were made. - **NonCommercial**: You may not use the material for commercial purposes. - **ShareAlike**: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. #### License Badge [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/) ### 5. Citation If you'd like to use our benchmark or cite this paper, please kindly use the reference below: ```bibtex @article{wu2024mobilevlm, title={Mobilevlm: A vision-language model for better intra-and inter-ui understanding}, author={Wu, Qinzhuo and Xu, Weikai and Liu, Wei and Tan, Tao and Liu, Jianfeng and Li, Ang and Luan, Jian and Wang, Bin and Shang, Shuo}, journal={arXiv preprint arXiv:2409.14818}, year={2024} }