# smooth-llm **Repository Path**: zzikang/smooth-llm ## Basic Information - **Project Name**: smooth-llm - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-11 - **Last Updated**: 2024-06-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SmoothLLM [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) This is the official source code for "[SmoothLLM: Defending LLMs Against Jailbreaking Attacks](https://arxiv.org/abs/2310.03684)" by [Alex Robey](https://arobey1.github.io/), [Eric Wong](https://riceric22.github.io/), [Hamed Hassani](https://www.seas.upenn.edu/~hassani/), and [George J. Pappas](https://www.georgejpappas.org/). To learn more about our work, see [our blog post](https://debugml.github.io/smooth-llm/). ![Introduction to SmoothLLM](assets/introduction.gif) ## Installation **Step 1:** Create an empty virtual environment. ```bash conda create -n smooth-llm python=3.10 conda activate smooth-llm ``` **Step 2:** Install the source code for "[Universal and Transferable Adversarial Attacks on Aligned Language Models](https://arxiv.org/abs/2307.15043)." ```bash git clone https://github.com/llm-attacks/llm-attacks.git cd llm-attacks pip install -e . ``` **Step 3:** Download the weights for [Vicuna](https://huggingface.co/lmsys/vicuna-13b-v1.5) and/or [Llama2](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) from HuggingFace. **Step 4:** Change the paths to the model and tokenizer in `lib/model_configs.py` depending on which set(s) of weights you downloaded in Step 3. ```python MODELS = { 'llama2': { 'model_path': '/shared_data0/arobey1/llama-2-7b-chat-hf', 'tokenizer_path': '/shared_data0/arobey1/llama-2-7b-chat-hf', 'conversation_template': 'llama-2' }, 'vicuna': { 'model_path': '/shared_data0/arobey1/vicuna-13b-v1.5', 'tokenizer_path': '/shared_data0/arobey1/vicuna-13b-v1.5', 'conversation_template': 'vicuna' } } ``` The `conversation_template` value is used to initialize a `fastchat` conversation template. ## Experiments We provide ten adversarial suffix generated by running GCG for Vicuna and Llama2 in the `data/` directory. You can run SmoothLLM by running: ```bash python main.py \ --results_dir ./results \ --target_model vicuna \ --attack GCG \ --attack_logfile data/GCG/vicuna_behaviors.json \ --smoothllm_pert_type RandomSwapPerturbation \ --smoothllm_pert_pct 10 \ --smoothllm_num_copies 10 ``` You can also change SmoothLLM's hyperparameters---the number of copies, the perturbation percentage, and the perturbation function---by changing the named arguments. At present, we support three kinds of perturbations: swaps, patches, and insertions. For more details, see Algorithm 2 in [our paper](https://arxiv.org/abs/2310.03684). To use these functions, you can replace the `--perturbation_type` value with `RandomSwapPerturbation`, `RandomPatchPerturbation`, or `RandomInsertPerturbation`. ## Reproducibility The following codebases have reimplemented our results: * https://gist.github.com/deadbits/4ab3f807441d72a2cf3105d0aea9de48 ## Citation If you find this codebase useful in your research, please consider citing: ```bibtex @article{robey2023smoothllm, title={SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks}, author={Robey, Alexander and Wong, Eric and Hassani, Hamed and Pappas, George J}, journal={arXiv preprint arXiv:2310.03684}, year={2023} } ``` ## License `smooth-llm` is licensed under the terms of the MIT license. See LICENSE for more details.