# LatentMAS **Repository Path**: underdogs/LatentMAS ## Basic Information - **Project Name**: LatentMAS - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-12-05 - **Last Updated**: 2026-01-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

LatentMAS

Latent Collaboration in Multi-Agent Systems

Arxiv Huggingface Paper X

---

## πŸ’‘ Introduction **LatentMAS** is a multi-agent reasoning framework that **moves agent collaboration from token space into the model’s latent space**. Instead of producing long textual reasoning traces, agents communicate by **passing latent thoughts** through their own **working memory**. LatentMAS has the following key features: - **Efficient** multi-step reasoning with drastically fewer tokens - **Training-free** latent-space alignment for stable generation - **A general technique** compatible with **any HF model** and optionally **vLLM** backends. Overall, LatentMAS achieves **superior performance**, **lower token usage**, and **major wall-clock speedups** of multi-agent system.

## πŸ”” News - **[2025-11-25]** We have released our paper and code implementations for LatentMAS! Stay tuned for more model-backbone supports and advanced features! - **[2025-11-25]** We are featured as πŸ€— [**HuggingFace 1st Paper of the Day**](https://huggingface.co/papers/2511.20639)! ## πŸ“Š Experiments Overview ### ⭐ Main Results Three main tables from our paper spanning 9 tasks across math & science reasoning, commensonse reasoning, and code generation: - **Table 1 β€” LatentMAS under the Sequantial MAS setting**

- **Table 2 β€” LatentMAS under the Hierarchical MAS setting**

- **Table 3 β€” Main Results on Reasoning Intensive Tasks**

### ⚑ Superior Efficiency on **Time and Tokens** Overall, LatentMAS reduces: - **~50–80% tokens** - **~3×–7Γ— wall-clock time** compared to standard Text-MAS or chain-of-thought baselines. ## πŸ› οΈ Getting Started This repository provides all code for reproducing LatentMAS, TextMAS, and baseline single-agent experiments across GSM8K, AIME24/25, GPQA, ARC-Easy/Challenge, MBPP+, HumanEval+, and MedQA. ### βš™οΈ Setup Environment Variables We recommend setting your HF cache directory to avoid repeated downloads: ```bash export HF_HOME=/path/to/huggingface export TRANSFORMERS_CACHE=$HF_HOME export HF_DATASETS_CACHE=$HF_HOME ```` Models and datasets will automatically be downloaded into `$HF_HOME`. ### πŸ“¦ Install Packages ```bash conda create -n latentmas python=3.10 -y conda activate latentmas pip install -r requirements.txt ``` If you want **vLLM support**, also install: ```bash pip install vllm ``` ## πŸš€ Quick Start ### 1. Clone the repo ```bash git clone https://github.com/YourRepo/LatentMAS.git cd LatentMAS ``` ### 2. Repository Structure ``` LatentMAS/ │── run.py # Main entry for experiments │── models.py # Wrapper for HF + vLLM + latent realignment │── methods/ β”‚ β”œβ”€β”€ baseline.py # Single-agent baseline β”‚ β”œβ”€β”€ text_mas.py # Token-space multi-agent method β”‚ └── latent_mas.py # Latent-space multi-agent (our method) │── prompts.py # Prompt constructors │── data.py # Dataset loaders │── data/ # Provided data + figures (We give medqa.json as an example here) │── utils.py # Answer parsing / timeout / helpers │── example_logs/ # Example logs from LatentMAS │── requirements.txt ``` ## πŸ§ͺ Running Experiments (standard HF backend) ### πŸ”Ή **Baseline (single model)** ```bash python run.py --method baseline --model_name Qwen/Qwen3-14B --task gsm8k --max_samples 100 ``` ### πŸ”Ή **TextMAS (text based multi-agent system)** ```bash python run.py --method text_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples 100 ``` ### πŸ”Ή **LatentMAS (our latent mas method)** ```bash python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --latent_steps 20 --prompt sequential --max_samples 100 ``` #### Notes: * **`--latent_steps`** ∈ [0, 80] Tune for best performance β€” typically **20–40** works well. * **`--latent_space_realign`** Enables latentβ†’embedding alignment We treat this as a **hyperparameter** β€” enable/disable depending on task/model: ```bash python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --latent_steps 20 --prompt sequential --max_samples 100 --latent_space_realign ``` ## πŸ“˜ Example Logs Two example LatentMAS logs are provided for reference purposes: * `example_logs/qwen3_14b_mbppplus_sequential.txt` * `example_logs/qwen3_14b_humanevalplus_hierarchical.txt` You can open them to view the full agent interaction traces and outputs. ## ⚑ vLLM Integration LatentMAS supports vLLM for faster inference. ### πŸ”Ή Baseline with vLLM ```bash python run.py --method baseline --model_name Qwen/Qwen3-14B --task gsm8k --max_samples 100 --use_vllm ``` ### πŸ”Ή TextMAS with vLLM ```bash python run.py --method text_mas --model_name Qwen/Qwen3-14B --task gsm8k --prompt sequential --max_samples 100 --use_vllm ``` ### πŸ”Ή LatentMAS with vLLM LatentMAS supports a **hybrid HF + vLLM pipeline** for fast inference: - vLLM handles **final text generation** (with prefix caching, tensor parallelism, etc.) - A HuggingFace model handles **latent-space rollout** and hidden-state alignment For this setup, we recommend using two GPUs: - One GPU for vLLM (`--device`, e.g., `cuda:0`) - One GPU for the auxiliary HF model (`--device2`, e.g., `cuda:1`) ```bash CUDA_VISIBLE_DEVICES=0,1 python run.py --method latent_mas --model_name Qwen/Qwen3-14B --task gsm8k --latent_steps 20 --prompt sequential --max_samples 100 \ --use_vllm \ --use_second_HF_model \ --enable_prefix_caching \ --device2 cuda:1 ``` **πŸ“Important Note:** > vLLM does **not** officially support modifying KV-cache or prompting via latent embeddings. > We modify the partial inner package inside vLLM backend for our method implementation. > Note minor numeric differences may arise compared to offical HF backend due to different decoding (generation) strategies. Please Use the HF backend to reproduce the official published results. ## πŸ“š Citation πŸ’« If you find **LatentMAS** helpful, please kindly give us a star ⭐️ and cite below. Thanks! ``` @article{zou2025latentmas, title={Latent Collaboration in Multi-Agent Systems}, author={Zou, Jiaru and Yang, Xiyuan and Qiu, Ruizhong and Li, Gaotang and Tieu, Katherine and Lu, Pan and Shen, Ke and Tong, Hanghang and Choi, Yejin and He, Jingrui and Zou, James and Wang, Mengdi and Yang, Ling}, journal={arXiv preprint arXiv:2511.20639}, year={2025} } ``` ## 🀝 Ackowledgement This code is partially based on the amazing work of [vLLM](https://github.com/vllm-project/vllm).