# DuoGuard **Repository Path**: cswbyu/DuoGuard ## Basic Information - **Project Name**: DuoGuard - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-16 - **Last Updated**: 2025-04-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Figure 1. Overview of the two-player training pipeline. The generator produces synthetic data from seed data. The classifier make predictions and we measure these examples as being predicted correctly or incorrectly based on their seed data label. We train the generator with DPO to create increasingly challenging examples, which in turn improve the classifier through iterative training.
## Setup ### Environment Installation ```bash conda create -n duoguard python=3.10 -y conda activate duoguard pip install -r requirements.txt ``` ## Evaluation In `evaluation/test_single_input.py`, we provide the code to test a single input entry and obtain the full probability output from DuoGuard. #### Run Evaluation Script ```bash bash scripts/eval.sh ``` #### Run Language-Specific Evaluations ```bash python evaluation/evaluate_duoguard.py --language En python evaluation/evaluate_duoguard.py --language Fr python evaluation/evaluate_duoguard.py --language Es python evaluation/evaluate_duoguard.py --language De ``` ## 📊 Results DuoGuard achieves superior multilingual safety performance compared to existing guardrail models on average across the six benchmarks (XSTest, OpenAI Moderation, ToxicChat, BeaverTail, RTP-LX, XSafety): | Model | Size | En-F1 | Fr-F1 | Es-F1 | De-F1 | Speed (ms/input) | |-----------------|------|------|------|------|------|-----------------| | LlamaGuard3 | 1B | 45.2 | 44.6 | 45.0 | 44.7 | 45.6 | | ShieldGemma | 2B | 43.1 | 37.4 | 37.0 | 36.8 | 61.8 | | LlamaGuard2 | 8B | 59.7 | 56.6 | 56.5 | 55.4 | 52.3 | | LlamaGuard3 | 8B | 63.4 | 61.9 | 61.5 | 61.3 | 72.1 | | **DuoGuard** | **0.5B** | **74.9** | **72.7** | **73.9** | **71.9** | **16.0** | ## 📄 Citation If you use DuoGuard in your research, please cite: ``` @misc{deng2025duoguardtwoplayerrldrivenframework, title={DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails}, author={Yihe Deng and Yu Yang and Junkai Zhang and Wei Wang and Bo Li}, year={2025}, eprint={2502.05163}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.05163}, } ```