# Recommender-System-with-TF2.0
**Repository Path**: MarkBY/Recommender-System-with-TF2.0
## Basic Information
- **Project Name**: Recommender-System-with-TF2.0
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: reclearn
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-01-11
- **Last Updated**: 2023-02-06
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
## RecLearn
[简体中文](https://github.com/ZiyaoGeng/Recommender-System-with-TF2.0/blob/reclearn/README_CN.md) | [English](https://github.com/ZiyaoGeng/Recommender-System-with-TF2.0/tree/reclearn)
RecLearn(Recommender Learning)对`Recommender System with TF2.0`中 [master](https://github.com/ZiyaoGeng/RecLearn/tree/master) 分支的内容进行了归纳、整理,是一个基于Python和Tensorflow2.x开发的推荐学习框架,适合学生、初学者研究使用。**当然如果你更习惯master分支中的内容,并希望对其中的内容进行修改、更新,可以直接clone整个包的内容进行使用**。实现的推荐算法按照工业界的两个应用阶段进行分类:
- matching recommendation stage
- ranking recommendeation stage
## 更新
**23/04.2022**:更新了所有的召回模型。
## 安装
### Package
RecLearn已经上传在pypi上,可以使用`pip`进行安装:
```shell
pip install reclearn
```
所依赖的环境:
- python3.8+
- Tensorflow2.5-GPU+/Tensorflow2.5-CPU+
- sklearn0.23+
### Local
也可以直接clone Reclearn到本地:
```shell
git clone -b reclearn git@github.com:ZiyaoGeng/RecLearn.git
```
## 快速开始
在`example`中,给出了每一个推荐模型的demo。
### Matching
**1、划分数据集**
给定数据集的路径:
```python
file_path = 'data/ml-1m/ratings.dat'
```
划分当前数据集为训练集、验证集、测试集。如果你使用了`movielens-1m`、`Amazon-Beauty`、`Amazon-Games`、`STEAM`数据集的话,也可以直接调用Reclearn中`data/datasets/*`的方法,完成划分:
```python
train_path, val_path, test_path, meta_path = ml.split_seq_data(file_path=file_path)
```
其中`meta_path`为元文件的路径,元文件保存了用户、物品索引的最大值。
**2、加载数据**
完成对训练集、验证集、测试集的读取,并且对每一个正样本分别生成若干个负样本(随即采样),数据的格式为字典:
```
data = {'pos_item':, 'neg_item': , ['user': , 'click_seq': ,...]}
```
如果你构建的模型为序列推荐模型,需要引入点击序列。对于上述4个数据集,Reclearn提供了加载数据的方法:
```python
# general recommendation model
train_data = ml.load_data(train_path, neg_num, max_item_num)
# sequence recommendation model, and use the user feature.
train_data = ml.load_seq_data(train_path, "train", seq_len, neg_num, max_item_num, contain_user=True)
```
**3、给定超参数**
模型需要指定所需的超参数,以`BPR`模型为例:
```python
model_params = {
'user_num': max_user_num + 1,
'item_num': max_item_num + 1,
'embed_dim': FLAGS.embed_dim,
'use_l2norm': FLAGS.use_l2norm,
'embed_reg': FLAGS.embed_reg
}
```
**4、构建模型、编译**
选择或构建你需要的模型,并进行编译。以`BPR`为例:
```python
model = BPR(**model_params)
model.compile(optimizer=Adam(learning_rate=FLAGS.learning_rate))
```
如果你对模型的结构存在问题的话,编译之后可以调用`summary`方法打印查看:
```python
model.summary()
```
**5、学习以及预测**。
```python
for epoch in range(1, epochs + 1):
t1 = time()
model.fit(
x=train_data,
epochs=1,
validation_data=val_data,
batch_size=batch_size
)
t2 = time()
eval_dict = eval_pos_neg(model, test_data, ['hr', 'mrr', 'ndcg'], k, batch_size)
print('Iteration %d Fit [%.1f s], Evaluate [%.1f s]: HR = %.4f, MRR = %.4f, NDCG = %.4f'
% (epoch, t2 - t1, time() - t2, eval_dict['hr'], eval_dict['mrr'], eval_dict['ndcg']))
```
### Ranking
针对Criteo数据集,采用了两种数据处理方法:加载部分数据训练模型或者通过分割数据集的方法使用全部数据训练。第一种方法参考`example/train_small_criteo_demo.py`。第二种方法参考`example/r_deepfm_demo.py`文件,具体如下所示:
**1、分割数据集**
调用`reclearn.data.datasets.criteo.get_split_file_path(parent_path, dataset_path, sample_num)`方法可以将数据集分割,`sample_num`确定每一个子集样本数量,所以子集保存在数据集对应的路径。若之前已经分割完成,没有改变子数据集路径可以直接读取,或者可以赋值`parent_path`。
```python
sample_num = 4600000
split_file_list = get_split_file_path(dataset_path=file, sample_num=sample_num)
```
**2、建立特征映射**
分割数据集后,在整个数据集下对所有的特征进行映射(静态Embedding层需要确定大小),并且密集数据类型进行分桶处理转化为离散数据类型。调用`get_fea_map(fea_map_path, split_file_list)`方法,最后保存为映射文件保存为`fea_map.pkl`。若之前已经完成该步骤,可以赋值`fea_map_path`参数。
```python
# If you want to make feature map.
fea_map = get_fea_map(split_file_list=split_file_list)
# Or if you want to load feature map.
# fea_map = get_fea_map(fea_map_path='data/criteo/split/fea_map.pkl')
```
**3、加载测试集**
选择最后一个子数据集作为测试集。
```python
feature_columns, test_data = create_criteo_dataset(split_file_list[-1], fea_map)
```
**4、构建模型**
```python
model = FM(feature_columns=feature_columns, **model_params)
model.summary()
model.compile(loss=binary_crossentropy, optimizer=Adam(learning_rate=learning_rate),
metrics=[AUC()])
```
**5、迭代训练,并验证**
```python
for file in split_file_list[:-1]:
print("load %s" % file)
_, train_data = create_criteo_dataset(file, fea_map)
# TODO: Fit
model.fit(
x=train_data[0],
y=train_data[1],
epochs=1,
batch_size=batch_size,
validation_split=0.1
)
# TODO: Test
print('test AUC: %f' % model.evaluate(x=test_data[0], y=test_data[1], batch_size=batch_size)[1])
```
## 实验结果
Reclearn所设计的实验环境与部分论文不同,所以结果可能会存在一定偏差,具体请参考[experiement](./docs/experiment.md)。
### Matching
| Model |
ml-1m |
Beauty |
STEAM |
| HR@10 | MRR@10 | NDCG@10 |
HR@10 | MRR@10 | NDCG@10 |
HR@10 | MRR@10 | NDCG@10 |
| BPR | 0.5768 | 0.2392 | 0.3016 | 0.3708 | 0.2108 | 0.2485 | 0.7728 | 0.4220 | 0.5054 |
| NCF | 0.5834 | 0.2219 | 0.3060 | 0.5448 | 0.2831 | 0.3451 | 0.7768 | 0.4273 | 0.5103 |
| DSSM | 0.5498 | 0.2148 | 0.2929 | - | - | - | - | - | - |
| YoutubeDNN | 0.6737 | 0.3414 | 0.4201 | - | - | - | - | - | - |
| GRU4Rec | 0.7969 | 0.4698 | 0.5483 | 0.5211 | 0.2724 | 0.3312 | 0.8501 | 0.5486 | 0.6209 |
| Caser | 0.7916 | 0.4450 | 0.5280 | 0.5487 | 0.2884 | 0.3501 | 0.8275 | 0.5064 | 0.5832 |
| SASRec | 0.8103 | 0.4812 | 0.5605 | 0.5230 | 0.2781 | 0.3355 | 0.8606 | 0.5669 | 0.6374 |
| AttRec | 0.7873 | 0.4578 | 0.5363 | 0.4995 | 0.2695 | 0.3229 | - | - | - |
| FISSA | 0.8106 | 0.4953 | 0.5713 | 0.5431 | 0.2851 | 0.3462 | 0.8635 | 0.5682 | 0.6391 |
### Ranking
| Model |
500w(Criteo) |
Criteo |
| Log Loss |
AUC |
Log Loss |
AUC |
| FM | 0.4765 | 0.7783 | 0.4762 | 0.7875 |
| FFM | - | - | - | - |
| WDL | 0.4684 | 0.7822 | 0.4692 | 0.7930 |
| Deep Crossing | 0.4670 | 0.7826 | 0.4693 | 0.7935 |
| PNN | - | 0.7847 | - | - |
| DCN | - | 0.7823 | 0.4691 | 0.7929 |
| NFM | 0.4773 | 0.7762 | 0.4723 | 0.7889 |
| AFM | 0.4819 | 0.7808 | 0.4692 | 0.7871 |
| DeepFM | - | 0.7828 | 0.4650 | 0.8007 |
| xDeepFM | 0.4690 | 0.7839 | 0.4696 | 0.7919 |
## 复现论文列表
### 召回模型(Top-K推荐)
| Paper\|Model | Published | Author |
| :----------------------------------------------------------: | :----------: | :------------: |
| BPR: Bayesian Personalized Ranking from Implicit Feedback\|**MF-BPR** | UAI, 2009 | Steffen Rendle |
| Neural network-based Collaborative Filtering\|**NCF** | WWW, 2017 | Xiangnan He |
| Learning Deep Structured Semantic Models for Web Search using Clickthrough Data\|**DSSM** | CIKM, 2013 | Po-Sen Huang |
| Deep Neural Networks for YouTube Recommendations\| **YoutubeDNN** | RecSys, 2016 | Paul Covington |
| Session-based Recommendations with Recurrent Neural Networks\|**GUR4Rec** | ICLR, 2016 | Balázs Hidasi |
| Self-Attentive Sequential Recommendation\|**SASRec** | ICDM, 2018 | UCSD |
| Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding\|**Caser** | WSDM, 2018 | Jiaxi Tang |
| Next Item Recommendation with Self-Attentive Metric Learning\|**AttRec** | AAAAI, 2019 | Shuai Zhang |
| FISSA: Fusing Item Similarity Models with Self-Attention Networks for Sequential Recommendation\|**FISSA** | RecSys, 2020 | Jing Lin |
### 排序模型(CTR预估)
| Paper|Model | Published | Author |
| :----------------------------------------------------------: | :----------: | :----------------------------------------------------------: |
| Factorization Machines\|**FM** | ICDM, 2010 | Steffen Rendle |
| Field-aware Factorization Machines for CTR Prediction|**FFM** | RecSys, 2016 | Criteo Research |
| Wide & Deep Learning for Recommender Systems|**WDL** | DLRS, 2016 | Google Inc. |
| Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features\|**Deep Crossing** | KDD, 2016 | Microsoft Research |
| Product-based Neural Networks for User Response Prediction\|**PNN** | ICDM, 2016 | Shanghai Jiao Tong University |
| Deep & Cross Network for Ad Click Predictions|**DCN** | ADKDD, 2017 | Stanford University|Google Inc. |
| Neural Factorization Machines for Sparse Predictive Analytics\|**NFM** | SIGIR, 2017 | Xiangnan He |
| Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks\|**AFM** | IJCAI, 2017 | Zhejiang University\|National University of Singapore |
| DeepFM: A Factorization-Machine based Neural Network for CTR Prediction\|**DeepFM** | IJCAI, 2017 | Harbin Institute of Technology\|Noah’s Ark Research Lab, Huawei |
| xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems\|**xDeepFM** | KDD, 2018 | University of Science and Technology of China |
| Deep Interest Network for Click-Through Rate Prediction\|**DIN** | KDD, 2018 | Alibaba Group |
## 讨论
对于项目有任何建议或问题,可以在`Issue`留言。