# AlignTree **Repository Path**: xiao-mingyu/AlignTree ## Basic Information - **Project Name**: AlignTree - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-26 - **Last Updated**: 2026-04-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AlignTree: Efficient Defense Against LLM Jailbreak Attacks **Content warning**: This repository contains text that is offensive, harmful, or otherwise inappropriate in nature. This repository contains code and results accompanying the paper "AlignTree: Efficient Defense Against LLM Jailbreak Attacks". In the spirit of scientific reproducibility, we provide code to reproduce the main results from the paper. - [Paper](https://arxiv.org/abs/2511.12217) ## Setup To use this project, 1) Run the following command to install dependencies (We used python 3.10.8): ```bash pip install -r requirements.txt ``` 2) Fill your hugging face token and openai token in pipeline/config.py 3) Experiments are set inside run_experiments.py as a list of dicts. 4) For example to use our main strategy AlignTree this is the way to set the experiment. ```python { "preprocess_defenses": [{"defense": "AlignTree", "class_parameters": {"parameters_path": get_model_artifact_path(guard_model_path)}}], "inprocess_defenses": [], "postprocess_defenses": [] }, ``` 4) This config enables using multiple defenses together if needed. ## Reproducing main results To reproduce the main results from the paper, run the following command: ```bash python3 -m run_experiments ``` Each experiment performs the following steps: 1. Generate refusal direction for the model if needed (not already created and experiment requires it) - Saves refusal artifacts inside `pipeline/runs/{model_alias}` 2. Runs each defense in the experiment - RefusalClassifier involves training a random forest. - AlignTree requires training SVMs and a random forest. - Arficats will be saved inside `pipeline/runs/{model_alias}` 4. Experiment output - Completions will be saved in `pipeline/runs/{model_alias}/defenses/{dataset}/{strategy}.json` - Summarized stats will be saved in `pipeline/runs/{model_alias}/{dataset}/stats.json` ## Credits This work took great inspiration from [Refusal Direction is Mediated by a Single Direction](https://github.com/andyrdt/refusal_direction/tree/main/dataset) The following defenses were implemented using prior works: 1. [SelfDefense](https://github.com/poloclub/llm-self-defense) 2. [AutoDefense](https://github.com/XHMY/AutoDefense) 3. [PerplexityDefense](https://github.com/neelsjain/baseline-defenses) 4. [SmoothLLM](https://github.com/arobey1/smooth-llm)