# AlignLab **Repository Path**: mirrors/AlignLab ## Basic Information - **Project Name**: AlignLab - **Description**: AlignLab是全面的模型对齐框架,它为研究人员和实践者提供了易于使用的工具,以评估并提升AI模型在安全性、真实性、偏见、毒性和智能体鲁棒性等多方面的表现 - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/alignlab - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2025-08-29 - **Last Updated**: 2026-01-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AlignLab, by OpenAlign @agi_atharv AlignLab is a comprehensive alignment framework that provides researchers and practitioners with easy to use tools for aligning models across multiple dimensions: safety, truthfulness, bias, toxicity, and agentic robustness. There is a lot more to add to AlignLab! Please work with us and contribute to the project. ## 🏗️ Repository Layout (Monorepo) ``` alignlab/ ├── packages/ │ ├── alignlab-core/ # Core Python library (datasets, runners, scoring, reports) │ ├── alignlab-cli/ # CLI wrapping core functionality │ ├── alignlab-evals/ # Thin adapters to lm-eval-harness, OpenAI evals, JailbreakBench, HarmBench │ ├── alignlab-guards/ # Guard models (Llama-Guard adapters), rule engines, ensembles │ ├── alignlab-dash/ # Streamlit/FastAPI dashboard + report viewer │ └── alignlab-agents/ # Sandboxed agent evals (tools, browsing-locked simulators) ├── benchmarks/ # YAML registry (sources, splits, metrics, licenses) ├── templates/ # Cookiecutter for new evals/benchmarks ├── docs/ # MkDocs documentation └── examples/ # Runnable notebooks & scripts ``` ## 🎯 Core Promises ### Registry-First Design Every benchmark is a single YAML entry with fields for taxonomy, language coverage, scoring, and version pinning. Reproducible by default. ### Adapters, Not Reinvention First-class adapters to: - **lm-evaluation-harness** for general QA/MC/MT (EleutherAI) - **OpenAI Evals** for templated eval flows & rubric judging - **JailbreakBench** (attacks/defenses, scoring, leaderboard spec) - **HarmBench** (automated red teaming + robust refusal) ### Multilingual Support Plug-in loaders for multilingual toxicity, truthfulness, and bias datasets. ### Guard-Stack Unified API to run Llama-Guard-3 (and alternatives) as pre/post-filters or judges, with ensemble and calibration support. ### Agent Evals Safe, containerized tools and scenarios with metrics like ASR (attack success rate), query efficiency, and over-refusal. ### Reports That Matter One command yields a NeurIPS-style PDF/HTML with tables, CIs, error-buckets, and taxonomy-level breakdowns. ## 🚀 Quick Start ### Installation (Development) ```bash # Install uv for dependency management pipx install uv # Create virtual environment and install packages uv venv uv pip install -e packages/alignlab-core -e packages/alignlab-cli \ -e packages/alignlab-evals -e packages/alignlab-guards \ -e packages/alignlab-agents -e packages/alignlab-dash ``` ### Run a Full Safety Sweep ```bash # Run comprehensive safety evaluation alignlab eval run --suite alignlab:safety_core_v1 \ --model meta-llama/Llama-3.1-8B-Instruct --provider hf \ --guards llama_guard_3 --max-samples 200 \ --report out/safety_core_v1 # Generate reports alignlab report build out/safety_core_v1 --format html,pdf ``` ### CLI Examples ```bash # List available benchmarks and models alignlab benchmarks ls --filter safety,multilingual alignlab models ls # Run a single benchmark alignlab eval run truthfulqa --split validation --judge llm_rubric # Agentic robustness (jailbreaks) alignlab jailbreak run --suite jb_default_v1 --defense none --report out/jb ``` ## 📊 Prebuilt Suites ### `alignlab:safety_core_v1` - **Harm/Risk**: HarmBench, JailbreakBench, UltraSafety - **Toxicity**: RealToxicityPrompts, PolygloToxicityPrompts - **Truthfulness**: TruthfulQA + multilingual extensions - **Bias/Fairness**: BBQ, CrowS-Pairs, StereoSet, Multilingual HolisticBias - **Guards**: Optional pre/post Llama-Guard-3 ### `alignlab:agent_robustness_v1` - JailbreakBench canonical config - HarmBench auto red teaming subset - Prompt-injection battery ## 🔧 Key Modules & API ### Core Usage ```python from alignlab.core import EvalRunner, load_benchmark # Load and run a benchmark bench = load_benchmark("truthfulqa") runner = EvalRunner(model="meta-llama/Llama-3.1-8B-Instruct", provider="hf") result = runner.run(bench, split="validation", judge="exact_match|llm_rubric") result.to_report("reports/run_001") ``` ### Guard Pipeline ```python from alignlab.guards import LlamaGuard, RuleGuard, EnsembleGuard guard = EnsembleGuard([ LlamaGuard("meta-llama/Llama-Guard-3-8B"), RuleGuard.from_yaml("mlc_taxonomy.yaml") ]) safe_out = guard.wrap(model).generate(prompt) ``` ## 📋 Benchmark Registry ### Harm/Safety & Jailbreak - **HarmBench** – Automated red teaming + refusal robustness - **JailbreakBench** – NeurIPS'24 D&B artifacts, threat model, scoring - **UltraSafety** – 3k harmful prompts with paired jailbreak prompts ### Toxicity (Mono + Multi) - **RealToxicityPrompts** – Canonical 100k prompts - **PolygloToxicityPrompts** – 425k prompts, 17 languages ### Truthfulness / Hallucination - **TruthfulQA** – 817 Qs, 38 categories with MC/binary variants ### Bias/Fairness (Mono + Multi) - **BBQ**, **CrowS-Pairs**, **StereoSet**, **Multilingual HolisticBias** ## 📝 Adding a New Benchmark Create a YAML file in `benchmarks/`: ```yaml # benchmarks/truthfulqa.yaml id: truthfulqa source: type: huggingface repo: "sylinrl/TruthfulQA" splits: [validation] task: freeform_qa judges: - exact_match - llm_rubric: {model: "gpt-4o-mini", rubric: "truthfulqa_v1"} metrics: [truthful, informative, truthful_informative] taxonomy: [truthfulness] license: "MIT" version: "2025-01-15" ``` ## 📊 Reporting & UX - Per-taxonomy dashboards with macro/micro scores and CIs - Guard deltas and hazard-class confusion matrices - Multilingual heatmaps - Exportable bundles (JSONL, PDF, HTML) ## 🤝 Contributing # Contributing Guidelines Thanks for your interest in contributing! To keep the project stable and manageable, please follow these rules: ## 🔀 Workflow 1. **Fork** the repository (do not create branches directly in this repo). 2. **Create a branch** in your fork using the naming convention below. 3. **Make your changes** and ensure tests pass. 4. **Open a Pull Request (PR)** into the `dev` branch (or `main` if no dev branch exists). 5. Wait for CI to pass and maintainers to review. ## 🌱 Branch Naming Branches must follow these patterns: - `feat/` → New features - `fix/` → Bug fixes - `docs/` → Documentation changes - `test/` → Testing-related changes ✅ Examples: - `feat/add-user-auth` - `fix/navbar-overlap` - `docs/update-readme` ❌ Bad examples: - `patch1` - `update-stuff` ## ✅ Requirements Before PR - Run all tests locally and ensure they pass. - Follow code style guidelines (`npm run lint` or `make lint`). - Write clear commit messages. - PR title should match branch type (`feat: ...`, `fix: ...`, `docs: ...`). ## 🔒 Protected Branches - `main` is protected: no direct commits allowed. - All code must be merged via Pull Requests. - PRs require review approval + passing checks. ## 📜 Code of Conduct Be respectful and constructive. See the `docs/` directory for comprehensive documentation, including: - API reference - Benchmark development guide - Guard model integration - Agent evaluation setup ## 📄 License MIT License - see LICENSE file for details.