# guard **Repository Path**: mirrors_naver/guard ## Basic Information - **Project Name**: guard - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-25 - **Last Updated**: 2026-05-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GUARD ![alt text](Overview.png "Overview of GUARD") This is the code for **"[Guaranteed Generation from Large Language Models](https://openreview.net/forum?id=8roRgrjbjv)"**, *ICLR 2025*. ## Create conda environment and install requirements ``` conda create -n guard python=3.10 && conda activate guard pip install -r requirements.txt ``` ## Definition of LLM *a(y)* and constraint *b(y)* The experiments rely on the [disco](https://github.com/naver/disco) toolbox. Its basic usage is documented in the lexical_constraint.ipynb notebook. We define our ideal distribution *g* from the LLM *a* and constraint *b* as follows: ``` a = LMDistribution("google/gemma-2b", token=token, LLM=True) # base LLM a(y) b = lambda s, c: bool(re.search(r"\bamazing\b," s.text)) # hard constraint on the generated sequence. scorer = BooleanScorer(b) # transform the hard (binary) constraint into a boolean scorer (1 or 0) g = a * scorer # gold distribution ``` We experiment with two types of constraint: a lexical constraint: ```python b_lexical = lambda s, c: bool(re.search(r"\bamazing\b," s.text)) # the sequence must contain "amazing". ``` and a sentiment constraint: ``` python score_tokenizer = AutoTokenizer.from_pretrained("michellejieli/emotion_text_classifier") score_model = AutoModelForSequenceClassification.from_pretrained("michellejieli/emotion_text_classifier", num_labels=7).to('cuda') def sentiment_pipe(story): # "joy_class" is the 4th class of output probabilities. return Softmax(dim=-1)(score_model(**score_tokenizer(story, return_tensors="pt", max_length=512, truncation=True).to('cuda')).logits)[:,3].item() def is_positive(story="", t=0.98, prefix=""): story = prefix+story story = story.split('<|endoftext|>')[0] story = story.split('. ') if sentiment_pipe(story[-1]) > t: return True else: return False b_sentiment = lambda s, c: is_positive(story=s.text, t=0.98, prefix=prefix) # the sequence must have a positivity above threshold. ``` ## GUARD training and evaluation Both notebooks, lexical_constraint.ipynb and sentiment_reversal.ipynb, include the CAP, SFT, DPG, and DPG initialized with CAP methods. ### Constraint-Aware Prompting Rely on a constraint-aware prompt to achieve a higher acceptance rate in a zero-shot manner. ```python CAP = "Next sentence should contain 'amazing'.\n\n" distr = AccumulationSampler(distribution=a, total_size=50000) samples_a, distr_a = distr.sample(sampling_size=250, context="") # sample from a() samples_a2, distr_a2 = distr.sample(sampling_size=250, context=CAP) # sample with a(|CAP) ``` ### Supervised Fine-Tuning Sample many *y* from *a()* and filter it based on *b(y)=1*. ```python distr = AccumulationSampler(distribution=a, total_size=800000) samples_a, distr_a = distr.sample(sampling_size=500, context="") samples_g = [] for it, item in enumerate(samples_a): # y ~ a if b(item, _): # if b(y) = 1 samples_g.append({'text': item[1]}) # return y ``` Then, just fine-tune LLM on samples_g. ### Distributional Policy Gradients Adaptively sample *y* from policy *a_theta()*, and train distributional reward to match with ideal distribution $g$. ```python frozen_a = LMDistribution("google/gemma-2b", token=token, LLM=True) # reference a() a_theta = LMDistribution("google/gemma-2b", token=token, LLM=True, freeze=False) # policy b = lambda s, c: bool(re.search(r"\bamazing\b", s.text)) # hard constraint # constraint b scorer = BooleanScorer(b) g = frozen_a * scorer # ideal distribution g # DPG training tuner = DPGTuner(a_theta, g, context="", n_gradient_steps=400, n_samples_per_step=10000, sampling_size=500, scoring_size=500, divergence_evaluation_interval=10) ConsoleLogger(tuner) tuner.tune() ``` For DPG with CAP, see detailed code in each notebook. To cite **Guard**, please use: ``` @article{kim2024guaranteed, title={Guaranteed Generation from Large Language Models}, author={Kim, Minbeom and Thonet, Thibaut and Rozen, Jos and Lee, Hwaran and Jung, Kyomin and Dymetman, Marc}, journal={arXiv preprint arXiv:2410.06716}, year={2024} } ``` ## License See [LICENSE](LICENSE) file.