# MLA-Trust
**Repository Path**: duzgd/MLA-Trust
## Basic Information
- **Project Name**: MLA-Trust
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-08
- **Last Updated**: 2025-07-08
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
---
🛡️ **[MLA-Trust](https://mla-trust.github.io/)** is a comprehensive and unified framework that evaluates the MLA trustworthiness across four principled dimensions: **truthfulness**, **controllability**, **safety** and **privacy**. The framework includes 34 high-risk interactive tasks to expose new trustworthiness challenges in GUI environments.

- **Truthfulness** captures whether the agent correctly interprets visual or DOM-based elements on the GUI, and whether it produces factual outputs based on those perceptions.
- **Controllability** assesses whether the agent introduces unnecessary steps, drifts from the intended goal, or triggers side effects not specified by the user.
- **Safety** demonstrates whether the agent's actions are free from harmful or irreversible consequences, which encompasses the prevention of behaviors that cause financial loss, data corruption, or system failures.
- **Privacy** evaluates whether the agent respects the confidentiality of sensitive information. MLAs often capture screenshots, handle form data, and interact with files.
## 🎯 Main Findings
| 🚨 |
Severe vulnerabilities in GUI environments: Both proprietary and open-source MLAs that interact with GUIs exhibit more severe trustworthiness risks compared to traditional MLLMs, particularly in high-stakes scenarios such as financial transactions. |
| 🔄 |
Multi-step dynamic interactions amplify vulnerabilities: The transformation of MLLMs into GUI-based MLAs significantly compromises their trustworthiness. In multi-step interactive settings, these agents can execute harmful content that standalone MLLMs would typically reject. |
| ⚡ |
Emergence of derived risks from iterative autonomy: Multi-step execution enhances adaptability but introduces latent and nonlinear risk accumulation across decision cycles, leading to unpredictable derived risks. |
| 📈 |
Trustworthiness correlation: Open-source models employing structured fine-tuning strategies (e.g., SFT and RLHF) demonstrate improved controllability and safety. Larger models generally exhibit higher trustworthiness across multiple sub-aspects. |
## 💻 Installation
1. Install `uv` by following the [official installation guide](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer). Ensure the PATH environment variable is configured as prompted.
2. Install dependencies:
```bash
uv sync
uv sync --extra flash-attn
```
📱 Mobile Setup
#### A. ADB Setup and Configuration
> Reference: [Mobile-Agent-E Repository](https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-E)
1. **Install Android Debug Bridge (ADB)**
- Windows: Download from [Android Developer Platform Tools](https://developer.android.com/tools/releases/platform-tools)
- MacOS: `brew install android-platform-tools`
- Linux: `sudo apt-get install android-tools-adb`
2. **Enable Developer Options**
- Go to Settings → About phone
- Tap "MIUI version" multiple times until developer options are enabled (take Xiaomi for example)
- Navigate to Settings → Additional Settings → Developer options
3. **Enable USB Debugging**
- Enable "USB debugging" in Developer options
- Connect phone via USB cable
- Select "File Transfer" mode when prompted
4. **Verify ADB Connection**
```bash
## Check connected devices
adb devices
```
#### B. Task Preconditions
1. Modify `scripts/mobile/adb.sh` script for device setup
- Script functions: (a) Unlock device; (b) Return to home screen;
- Must execute before each task
- Customize according to your device specifications
2. Update ANDROID_SERIAL in `scripts/mobile/run_task.sh` to match your device
> Our experimental equipment and operating system versions are as follows: (a) Device: Redmi Note 13 Pro; (b) Operating System: Xiaomi HyperOS 2.0.6.0
🌐 Website Setup
#### A. Task Preconditions
Since many tasks require a login to function properly, we provide cookie loading functionality to enable the agent to work correctly. You only need to run the following command (must be run on a machine with a visual web interface), then perform your login, and finally close the popup website to save cookies.
```bash
python src/scene/web/load_cookies.py
```
Then save the generated `*.json` files to `src/scene/web/cookies`
## 🌟 Quick Start
1. Configure environment variables
```bash
cp .env.template .env
```
2. Activate virtual environment
```bash
source .venv/bin/activate
```
3. Execute main task
```bash
bash scripts/mobile/run_task.sh
bash scripts/web/run_task.sh
```
4. Run evaluation
```bash
bash scripts/mobile/eval.sh
bash scripts/web/eval.sh
```
## 🚀 Supported Models
The following models are supported:
- `gpt-4o-2024-11-20`
- `gpt-4-turbo`
- `gemini-2.0-flash`
- `gemini-2.0-pro-exp-02-05`
- `claude-3-7-sonnet-20250219`
- `llava-hf/llava-v1.6-mistral-7b-hf`
- `lmms-lab/llava-onevision-qwen2-72b-ov-sft`
- `lmms-lab/llava-onevision-qwen2-72b-ov-chat`
- `microsoft/Magma-8B`
- `Qwen/Qwen2.5-VL-7B-Instruct`
- `deepseek-ai/deepseek-vl2`
- `openbmb/MiniCPM-o-2_6`
- `mistral-community/pixtral-12b`
- `microsoft/Phi-4-multimodal-instruct`
- `OpenGVLab/InternVL2-8B`
## 📋 Task Overview

*Our comprehensive task suite covers 34 high-risk interactive scenarios across multiple domains*
## 🏆 Results

*Performance ranking of different MLAs across trustworthiness dimensions*
---
## 🤝 Acknowledgement
We acknowledge and thank the projects [Mobile-Agent-E](https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-E) and [SeeAct](https://github.com/OSU-NLP-Group/SeeAct), whose foundational work has supported the development of this project.
## 📞 Contact
For questions, suggestions or collaboration opportunities, please contact us at jankinfmail@gmail.com, 52285904015@stu.ecnu.edu.cn, yangxiao19@tsinghua.org.cn
## 🌟 Citation
If you find this work useful, please consider citing our paper:
```bibtex
@article{yang2025mla,
title={MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments},
author={Yang, Xiao and Chen, Jiawei and Luo, Jun and Fang, Zhengwei and Dong, Yinpeng and Su, Hang and Zhu, Jun},
journal={arXiv preprint arXiv:2506.01616},
year={2025}
}
```