# ReLLa **Repository Path**: LinJianghao/ReLLa ## Basic Information - **Project Name**: ReLLa - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-12 - **Last Updated**: 2024-06-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation ## Introduction This is the pytorch implementation of ***ReLLa*** proposed in the paper [ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation](https://arxiv.org/abs/2308.11131). In this repo, we implement ReLLa with ```transformers==4.28.1```. We also provide a newer version of implementation with ```transformers==4.35.2``` in this [repo](https://github.com/CHIANGEL/ReLLa-hf4.35.2). ## Requirements ~~~python pip install -r requirments.txt ~~~ ## Data preprocess You can directly use the processed data from [this link](https://drive.google.com/drive/folders/1av6mZpk0ThmkOKy5Y_dUnsLRdRK8oBjQ?usp=sharing). (including data w/o and w/ retrieval: full testing set, sampled training set, history length 30/30/60 for Ml-1m/Ml-25m/BookCrossing) Or you can preprocess by yourself. Scripts for data preprocessing of [BookCrossing](http://www2.informatik.uni-freiburg.de/~cziegler/BX/), [MovieLens-1M](https://grouplens.org/datasets/movielens/1m/), [MovieLens-25M](https://grouplens.org/datasets/movielens/25m/) are included in [data_preprocess](./data_preprocess/). ## Get semantic embeddings Get semantic item embeddings for retrieval. ~~~python python get_semantic_embed.py --model_path XXX --data_set BookCrossing/ml-1m/ml-25m --pooling average ~~~ ## Retrieval and pre-store the neighbor item indice - BookCrossing ~~~python python topK_relevant_BookCrossing.py ~~~ - MovieLens-1M ~~~python python topK_relevant_ml1m.py ~~~ - MovieLens-25M ~~~python python topK_relevant_ml25m.py ~~~ ## Convert data into text ~~~python python data2json.py --K 10 --temp_type simple --set test --dataset ml-1m ~~~ Demo processed data is under [./data/ml-1m/proc_data/data/test/test_5_simple.json](./data/ml-1m/proc_data/data/test/test_5_simple.json) ## Training_set_construction This step samples training data from the whole training set, and constructs a mixture dataset of both original data and retrieval-enhanced data. ~~~python python training_set_construction.py --K 5 ~~~ ## Quick start You should provide the model path in the scripts. ### Inference ~~~python python scripts/script_inference.py --K 5 --dataset ml-1m --temp_type simple ~~~ ### Finetune ~~~python python scripts/script_finetune.py --dataset ml-1m --K 5 --train_size 64 --train_type simple --test_type simple --epochs 5 --lr 1e-3 --total_batch_size 64 ~~~