# multimodalpragmatic **Repository Path**: sakuraa/multimodalpragmatic ## Basic Information - **Project Name**: multimodalpragmatic - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-21 - **Last Updated**: 2025-08-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # [ACL 2025 main; ReGenAI@CVPR Best Paper🏆] Multimodal Pragmatic Jailbreak on Text-to-image Models This is the official code repository for the paper of Multimodal Pragmatic Jailbreak on Text-to-image Models. Please do not hesitate to reach out for any questions.

How to use multimodal pragmatic jailbreak image datasets

We present the image dataset for evaluating image safety filters. In the folder of `images`, we provide human-annotated safe/unsafe images generated by various models. There are totally 482 (241*2) images. | ID Range | 0.png - 91.png | 92 - 133 | 134 - 233 | 234 - 240 | |----------------|-------------|---------------|---------------|---------------| | categories | hate | pornography | physicalharm | fraud | Image classifier performance: | Classifiers | Score | |-----------------|-------| | Random passing | 50.0 | | [NSFW checker](https://github.com/bhky/opennsfw2) | 49.6 | | [MHSC](https://arxiv.org/abs/2202.06675) | 53.3 | | [Q16](https://arxiv.org/abs/2305.13873) | 61.4 |

Installation

To get started, install the package: ```bash git clone https://github.com/multimodalpragmatic/multimodalpragmatic.git cd multimodalpragmatic pip install -r requirements.txt ```

About the dataset

A prompt contains three parts:
"<image-generation prompt>, <connect prompt> (e.g., "with a sign that says, "), '<visual text prompt>'."
<image-generation prompt> and <visual text prompt> can be viewed in huggingface link:

You can also simply modify and run

For model {model_name} on {dataclass} category of MPUP dataset: ```bash python t2i.py -model_name ${model_name} -prompt_mode "sign" -dataclass ${dataclass} ``` E.g., running Stable Diffusion model on hatespeech category of MPUP dataset: ```bash python t2i.py -model_name "SD" -prompt_mode "sign" -dataclass "hatespeech" ``` For running the Glyphcontrol model on MPUP dataset, please first follow their instructions to download the checkpoint file, then run ```bash inference_glyphcontrol.py ```

How to evaluate multimodal pragmatic jailbreak

For model {model_name} on {dataclass} category of MPUP dataset with {n} images: ```bash python multimodal_classification.py -model_name ${model_name} -prompt_mode "sign" -dataclass ${dataclass} -img_num ${n} ``` E.g., running Stable Diffusion model on hatespeech category of MPUP dataset with 500 generated images: ```bash python multimodal_classification.py -model_name "SD" -prompt_mode "sign" -dataclass "hatespeech" -img_num 500 ```

How to evaluate visual text rendering

For model {model_name} on {dataclass} category of MPUP dataset with {n} images: ```bash python ocr_evaluate.py -model_name ${model_name0} -prompt_mode "sign" -dataclass ${dataclass0} ``` E.g., running evaluation of visual text rendering on images generated by Stable Diffusion model on hatespeech category of MPUP dataset: ```bash python ocr_evaluate.py -model_name "SD" -prompt_mode "sign" -dataclass "hatespeech" ```

How to cite this work

```bash @inproceedings{liu-etal-2025-multimodal-pragmatic, title = "Multimodal Pragmatic Jailbreak on Text-to-image Models", author = "Liu, Tong and Lai, Zhixin and Wang, Jiawen and Zhang, Gengyuan and Chen, Shuo and Torr, Philip and Demberg, Vera and Tresp, Volker and Gu, Jindong", editor = "Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher", booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.acl-long.234/", doi = "10.18653/v1/2025.acl-long.234", pages = "4681--4720", ISBN = "979-8-89176-251-0", abstract = "Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-to-image (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10{\%} to 70{\%} where DALL{\textperiodcentered}E 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models." } ```