# AlignTree

**Repository Path**: xiao-mingyu/AlignTree

## Basic Information

- **Project Name**: AlignTree
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-26
- **Last Updated**: 2026-04-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# AlignTree: Efficient Defense Against LLM Jailbreak Attacks

**Content warning**: This repository contains text that is offensive, harmful, or otherwise inappropriate in nature.

This repository contains code and results accompanying the paper "AlignTree: Efficient Defense Against LLM Jailbreak Attacks".
In the spirit of scientific reproducibility, we provide code to reproduce the main results from the paper.

- [Paper](https://arxiv.org/abs/2511.12217)
## Setup
To use this project,
1) Run the following command to install dependencies (We used python 3.10.8):
```bash
pip install -r requirements.txt
```
2) Fill your hugging face token and openai token in pipeline/config.py
3) Experiments are set inside run_experiments.py as a list of dicts. 
4) For example to use our main strategy AlignTree this is the way to set the experiment.
```python
{
        "preprocess_defenses": [{"defense": "AlignTree",
                                 "class_parameters": {"parameters_path": get_model_artifact_path(guard_model_path)}}],
        "inprocess_defenses": [],
        "postprocess_defenses": []
},
```
4) This config enables using multiple defenses together if needed.

## Reproducing main results

To reproduce the main results from the paper, run the following command:

```bash
python3 -m run_experiments
```

Each experiment performs the following steps:
1. Generate refusal direction for the model if needed (not already created and experiment requires it)
    - Saves refusal artifacts inside `pipeline/runs/{model_alias}`
2. Runs each defense in the experiment
    - RefusalClassifier involves training a random forest.
    - AlignTree requires training SVMs and a random forest.
    - Arficats will be saved inside `pipeline/runs/{model_alias}`
4. Experiment output
    - Completions will be saved in `pipeline/runs/{model_alias}/defenses/{dataset}/{strategy}.json`
    - Summarized stats will be saved in `pipeline/runs/{model_alias}/{dataset}/stats.json`

## Credits
This work took great inspiration from [Refusal Direction is Mediated by a Single Direction](https://github.com/andyrdt/refusal_direction/tree/main/dataset)
The following defenses were implemented using prior works:
1. [SelfDefense](https://github.com/poloclub/llm-self-defense)
2. [AutoDefense](https://github.com/XHMY/AutoDefense)
3. [PerplexityDefense](https://github.com/neelsjain/baseline-defenses)
4. [SmoothLLM](https://github.com/arobey1/smooth-llm)