# fast **Repository Path**: mirrors_naver/fast ## Basic Information - **Project Name**: fast - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-16 - **Last Updated**: 2026-01-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Repository description This repository contains the code and datasets associated to the paper titled **FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data**. For more details, the paper is available [here](https://arxiv.org/abs/2508.04698). If you found the code or datasets useful and want to refer to our work, please use the following citation: ```bibtex @article{thonet2025fast, title={{FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data}}, author={Thibaut Thonet and Germ{\'{a}}n Kruszewski and Jos Rozen and Pierre Erbacher and Marc Dymetman}, journal={arXiv:2508.04698}, year={2025}, url={https://arxiv.org/abs/2508.04698} } ``` ---

NEWS: Our paper has been accepted to EMNLP 2025 Main Conference and will be presented in Suzhou, China from November 4 to 9, 2025!

--- ## Data description The repository includes a zip archive: `datasets.zip`. The archive is protected by a password to avoid [potential contamination](https://arxiv.org/abs/2310.18018) of LLMs trained on web scrapped data. To unzip the archive, run the command `unzip datasets.zip` at the root of the repository and **indicate '_fast_' as the password.** The folder `datasets` created after unzipping contains two subfolders: `dnd` and `elip`, which correspond to the new datasets introduced in the paper. Each of them contains a subfolder named `json` with two files: - `picks.json` provides the questionnaire (i.e., the set of questions / proposed answers for ELIP and the set of situations / possible actions for DnD). Additionally, this file indicates the preferred choice for each user or character. - `users.json` (for ELIP) or `characters.json` (for DnD) define the users/characters used in our experiments by describing their name and characteristics. These files are used by the oracle approaches and the LLM evaluator. In addition to the JSON files, the `dnd` and `elip` folders include a subfolder (named `dnd` and `elipa`, respectively) which contains our five different data splittings () in parquet format -- with 50% of questions/situations for training, 25% for validation and 25% for test. These files can be directly loaded as Huggingface datasets using the `load_dataset` function in Python. These splits are provided for each user/character (e.g., elipa-aaa or dnd-durin) separately, although the train/val/test splitting is the same across users/characters. For instance, the dataset corresponding to the first splitting (labeled as `0`) for user elipa-aaa can be loaded as follows: ```python from datasets import load_dataset train_ds, validation_ds, test_ds = load_dataset('datasets/elip/elipa/elipa-aaa/0', split=['train', 'validation', 'test']) ``` The columns in this Huggingface dataset are the following: - `character`: the name of the user/character; - `human`: the content of the question/situation; - `assistant_chosen`: the response/action preferred by the user/character for this question/siuation; - `assistant_rejected1`, ..., `assistant_rejectedn`: the responses/actions rejected by the user/character for this question/siuation. *Note: In different places in the code and data, the ELIP dataset is referred to as ELIPA. This is not a typo and corresponds to a variant of the ELIP dataset among different variants considered. This variant (i.e., ELIPA) does correspond to the one used in the experiments of the paper.* ## How to run the code In this section, we detail how to run the code, including the training of the proposed FaST approach and baselines for preferred response prediction as well as personalization generation, and the evaluation of these different methods. We will provide the commands used to run the different scripts, using the ELIP dataset as an example. Running the scripts for the DnD dataset instead is straightforward, the only difference is that the config files used should be taken from `config/dnd` instead of `config/elipa`. All commands indicated below should be run from the repository root. *Note: Some of the scrips (essentially those used for evaluation and oracle generation) assume an access to GPT-4o and GPT-4o-mini through OpenAI's API. These will thus require having the variable `OPENAI_API_KEY` set in your bash environment.* ### Package and version requirements This code should be run with Python 3.10. The required Python packages, along with their recommended version, are provided in `requirements1.txt` and `requirements2.txt`. The packages in `requirements1.txt` should be installed before those in `requirements2.txt`, i.e., one should run in in the following order: ```bash pip install -r requirements1.txt pip install -r requirements2.txt ``` ### Preprocessing The reward models relying on features (namely, the proposed FaRM and the CPM baseline) require some preprocessing steps which are described below. #### 1. Feature list generation First, a list of features is directly discovered from the data by prompting a strong LLM (e.g., GPT-4o) with the questionnaire, as described in Section 3.1 of the paper. This is done by running the following command (for the ELIP dataset): ```bash python -m preprocessing.generate_features --config preprocessing/config/elipa/generate-features.yml ``` #### 2. Feature-wise response scoring Once the list of features have been generated for the dataset of interest, the different responses of the questionnaire should be scored for each feature. This is done by prompting a small, open LLM (e.g., `Phi-4-Mini-Instruct` or `LLaMA-3.2-3B-Instruct`) as described in Section 3.2 of the paper. In our code, feature-wise response scoring is done in two steps: - **Feature score annotation**: We score responses from the dataset associated to a single user. Doing this once for all users offers the benefit of efficiency and consistency across users, and is possible due to the fact that all users are provided the same questionnaire. - **Feature score propagation**: We propagate the response scores to the datasets associated with all other users. The commands we provide below illustrate the case of feature-wise response scoring for `Phi-4-Mini-Instruct` as the LLM used for the feature functions. Using `LLaMA-3.2-3B-Instruct` instead can be done by replacing `phi-4-mini-instruct` with `llama-3.2-3b-instruct` in the config filenames. ##### 2.1. Feature score annotation The commands to conduct score annotation for the proposed Feature-aware Reward Model (FaRM) and the Compositional Preference Model (CPM) baseline are the following: - **FaRM**: ```bash python -m preprocessing.annotate_features --config preprocessing/config/elipa/annotate-features_farm_phi-4-mini-instruct.yml ``` - **CPM**: ```bash python -m preprocessing.annotate_features --config preprocessing/config/elipa/annotate-features_cpm_phi-4-mini-instruct.yml ``` ##### 2.2. Feature score propagation The propagation of score annotations to the other users is done by running the following commands: - **FaRM**: ```bash python -m preprocessing.propagate_annotations --config preprocessing/config/elipa/propagate-annotations_farm_phi-4-mini-instruct.yml ``` - **CPM**: ```bash python -m preprocessing.propagate_annotations --config preprocessing/config/elipa/propagate-annotations_cpm_phi-4-mini-instruct.yml ``` ### Preferred response prediction This task consists in measuring the ability of preference/reward models to predict the response preferred by a user on unseen contexts (see Section 4.2 of the paper for more details on the task and experimental setup). We provide below the commands used to run the main experiments for this task, which have been reported in Table 2 of the paper. The commands are provided for our proposed reward model FaRM, as well for the following baselines: Manyshot, Manyshot-CoT, RM, RM-LoRA, and CPM. The commands are given for the `Phi-4-Mini-Instruct` model; to use `LLaMA-3.2-3B-Instruct` instead, one can simply replace `phi-4-mini-instruct` with `llama-3.2-3b-instruct` in the config filenames. After running each response prediction script, a folder is created in `response_prediction/out`. It will contain a CSV file summarizing the prediction accuracy of the approach for each user/character, and for RM, RM-LoRA, CPM and FaRM it will also include the reward model learned for each user/character. *Note: In the case of CPM and FaRM, the feature-wise response scoring steps described previously in the Preprocessing section are required prior to running the following commands.* - **Manyshot**: ```bash python -m response_prediction.run_manyshot_classification --config response_prediction/config/elipa/run-manyshot-classification_manyshot_phi-4-mini-instruct.yml ``` - **Manyshot-CoT**: ```bash python -m response_prediction.run_manyshot_classification --config response_prediction/config/elipa/run-manyshot-classification_manyshot-cot_phi-4-mini-instruct.yml ``` - **RM**: ```bash python -m response_prediction.train_reward --config response_prediction/config/elipa/train-reward_rm_phi-4-mini-instruct.yml ``` - **RM-LoRA**: ```bash python -m response_prediction.train_reward --config response_prediction/config/elipa/train-reward_rm-lora_phi-4-mini-instruct.yml ``` - **CPM**: ```bash python -m response_prediction.train_farm --config response_prediction/config/elipa/train-farm_cpm_phi-4-mini-instruct.yml ``` - **FaRM**: ```bash python -m response_prediction.train_farm --config response_prediction/config/elipa/train-farm_farm_phi-4-mini-instruct.yml ``` ### Personalized generation The personalized generation task involves generating user-tailored responses for unseen contexts (questions for ELIP or situations for DnD), guided solely by user preferences derived from chosen responses on a shared questionnaire (see Section 4.3 of the paper for more details on the task and experimental setup). We provide the commands to run personalized generation (both fine-tuning, when relevant, and response generation) for the different approaches considered in our main experiment reported in Table 4 of the paper. These include the oracles Oracle-chosen and Oracle-gen, the baselines Zeroshot, RAG, Manyshot, Manyshot-CoT, SFT, DPO, RM with Best-of-N / PPO / Online-DPO / RFT, and the proposed FaST approach with Best-of-N / Online-DPO / RFT. Running each of these approaches will create a JSON file containing the generated responses/actions for each question/situation; it can then be found in `generation/out`. *Note: In the case of FaST- and RM-based approaches, the FaRM and RM models obtained in the preferred response prediction step are required to fine-tune the generation models.* - **Oracle-chosen**: ```bash python -m generation.run_oracle_chosen_generation --config generation/config/elipa/run-oracle-chosen-generation_oracle-chosen.yml ``` - **Oracle-gen**: ```bash python -m generation.run_oracle_gen_generation --config generation/config/elipa/run-oracle-gen-generation_oracle-gen.yml ``` - **Zeroshot**: ```bash python -m generation.run_zeroshot_generation --config generation/config/elipa/run-zeroshot-generation_zeroshot.yml ``` - **RAG**: ```bash python -m generation.run_rag_generation --config generation/config/elipa/run-rag-generation_rag.yml ``` - **Manyshot**: ```bash python -m generation.run_manyshot_generation --config generation/config/elipa/run-manyshot-generation_manyshot.yml ``` - **Manyshot-CoT**: ```bash python -m generation.run_manyshot_generation --config generation/config/elipa/run-manyshot-generation_manyshot-cot.yml ``` - **SFT**: ```bash python -m generation.train_sft --config generation/config/elipa/train-sft_sft.yml ``` - **DPO**: ```bash python -m generation.train_dpo --config generation/config/elipa/train-dpo_dpo.yml ``` - **RM w/ Best-of-N**: ```bash python -m generation.run_bon_generation --config generation/config/elipa/run-bon-generation_rm-bon.yml ``` - **RM w/ PPO**: ```bash python -m generation.train_ppo --config generation/config/elipa/train-ppo_rm-ppo.yml ``` - **RM w/ Online-DPO**: ```bash python -m generation.train_online_dpo --config generation/config/elipa/train-online-dpo_rm-online-dpo.yml ``` - **RM w/ RFT**: ```bash python -m generation.train_rft --config generation/config/elipa/train-rft_rm-rft.yml ``` - **FaST w/ Best-of-N**: ```bash python -m generation.run_bon_generation --config generation/config/elipa/run-bon-generation_fast-bon.yml ``` - **FaST w/ Online-DPO**: ```bash python -m generation.train_online_dpo --config generation/config/elipa/train-online-dpo_fast-online-dpo.yml ``` - **FaST w/ RFT**: ```bash python -m generation.train_rft --config generation/config/elipa/train-rft_fast-rft.yml ``` ### Evaluation The evaluation of the personalized generation approaches is done using an LLM-as-a-judge, as described in Appendix C.4 of the paper. For comprehensiveness, we considered both a numeric, score-based evaluator as well as a pairwise, winrate-based evaluator. In practice, we used `GPT-4o-mini` as the LLM evaluator. The commands to run the evaluation for both cases are provided below. In these commands, the `json-path` argument (or `json1-path` and `json2-path` for the the pairwise case) should be filled with the path to one of the JSON files created in `generation/out` in the personalized generation experiments. - **Numeric score-based evaluator**: ```bash python -m evaluation.numeric_evaluation --config evaluation/config/elipa/numeric-evaluation.yml --json-path path/to/json/file ``` - **Pairwise winrate-based evaluator**: ```bash python -m evaluation.pairwise_evaluation --config evaluation/config/elipa/pairwise-evaluation.yml --json1-path path/to/json/file1 --json2-path path/to/json/file2 ```