# Multitask-Recommendation-Library
**Repository Path**: liu-xiuzhen/Multitask-Recommendation-Library
## Basic Information
- **Project Name**: Multitask-Recommendation-Library
- **Description**: 该项目提供了多任务推荐模型和常用数据集的 PyTorch 实现。目前,已经实现了 7 个多任务推荐模型,以便进行公平比较。
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2024-11-30
- **Last Updated**: 2024-12-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: 多任务推荐模型, 数据集的PyTorch实现, 公平比较7个
## README
# Multi-task Recommendation in PyTorch
[](https://opensource.org/licenses/MIT) [](https://awesome.re)

-------------------------------------------------------------------------------
## Introduction
MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets. Currently, we implmented 7 multi-task recommendation models to enable fair comparison and boost the development of multi-task recommendation algorithms. The currently supported algorithms include:
* SingleTask:Train one model for each task, respectively
* Shared-Bottom: It is a traditional multi-task model with a shared bottom and multiple towers.
* OMoE: [Adaptive Mixtures of Local Experts](https://ieeexplore.ieee.org/abstract/document/6797059) (Neural Computation 1991)
* MMoE: [Modeling Task Relationships in Multi-task Learning with Multi-Gate Mixture-of-Experts](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007) (KDD 2018)
* PLE: [Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/pdf/10.1145/3383313.3412236?casa_token=8fchWD8CHc0AAAAA:2cyP8EwkhIUlSFPRpfCGHahTddki0OEjDxfbUFMkXY5fU0FNtkvRzmYloJtLowFmL1en88FRFY4Q) (RecSys 2020 best paper)
* AITM: [Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://dl.acm.org/doi/pdf/10.1145/3447548.3467071?casa_token=5YtVOYjJClUAAAAA:eVczwdynmE9dwoyElCG4da9fC5gsRiyX6zKt0_mIJF1K8NkU-SlNkGmpAu0c0EHbM3hBUe3zZc-o) (KDD 2021)
* MetaHeac: [Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising](https://easezyc.github.io/data/kdd21_metaheac.pdf) (KDD 2021)
## Datasets
* AliExpressDataset: This is a dataset gathered from real-world traffic logs of the search system in AliExpress. This dataset is collected from 5 countries: Russia, Spain, French, Netherlands, and America, which can utilized as 5 multi-task datasets. [Original_dataset](https://tianchi.aliyun.com/dataset/dataDetail?dataId=74690) [Processed_dataset Google Drive](https://drive.google.com/drive/folders/1F0TqvMJvv-2pIeOKUw9deEtUxyYqXK6Y?usp=sharing) [Processed_dataset Baidu Netdisk](https://pan.baidu.com/s/1AfXoJSshjW-PILXZ6O19FA?pwd=4u0r)
> For the processed dataset, you should directly put the dataset in './data/' and unpack it. For the original dataset, you should put it in './data/' and run 'python preprocess.py --dataset_name NL'.
## Requirements
* Python 3.6
* PyTorch > 1.10
* pandas
* numpy
* tqdm
## Run
Parameter Configuration:
- dataset_name: choose a dataset in ['AliExpress_NL', 'AliExpress_FR', 'AliExpress_ES', 'AliExpress_US'], default for `AliExpress_NL`
- dataset_path: default for `./data`
- model_name: choose a model in ['singletask', 'sharedbottom', 'omoe', 'mmoe', 'ple', 'aitm', 'metaheac'], default for `metaheac`
- epoch: the number of epochs for training, default for `50`
- task_num: the number of tasks, default for `2` (CTR & CVR)
- expert_num: the number of experts for ['omoe', 'mmoe', 'ple', 'metaheac'], default for `8`
- learning_rate: default for `0.001`
- batch_size: default for `2048`
- weight_decay: default for `1e-6`
- device: the device to run the code, default for `cuda:0`
- save_dir: the folder to save parameters, default for `chkpt`
You can run a model through:
```powershell
python main.py --model_name metaheac --num_expert 8 --dataset_name AliExpress_NL
```
## Results
> For fair comparisons, the learning rate is 0.001, the dimension of embeddings is 128, and mini-batch size is 2048 equally for all models. We report the mean AUC and Logloss over five random runs. Best results are in boldface.
Methods |
AliExpress (Netherlands, NL) |
AliExpress (Spain, ES) |
CTR |
CTCVR |
CTR |
CTCVR |
AUC |
Logloss |
AUC |
Logloss |
AUC |
Logloss |
AUC |
Logloss |
SingleTask |
0.7222 |
0.1085 |
0.8590 |
0.00609 |
0.7266 |
0.1207 |
0.8855 |
0.00456 |
Shared-Bottom |
0.7228 |
0.1083 |
0.8511 |
0.00620 |
0.7287 |
0.1204 |
0.8866 |
0.00452 |
OMoE |
0.7254 |
0.1081 |
0.8611 |
0.00614 |
0.7253 |
0.1209 |
0.8859 |
0.00452 |
MMoE |
0.7234 |
0.1080 |
0.8606 |
0.00607 |
0.7285 |
0.1205 |
0.8898 |
0.00450 |
PLE |
0.7292 |
0.1088 |
0.8591 |
0.00631 |
0.7273 |
0.1223 |
0.8913 |
0.00461 |
AITM |
0.7240 |
0.1078 |
0.8577 |
0.00611 |
0.7290 |
0.1203 |
0.8885 |
0.00451 |
MetaHeac |
0.7263 |
0.1077 |
0.8615 |
0.00606 |
0.7299 |
0.1203 |
0.8883 |
0.00450 |
Methods |
AliExpress (French, FR) |
AliExpress (America, US) |
CTR |
CTCVR |
CTR |
CTCVR |
AUC |
Logloss |
AUC |
Logloss |
AUC |
Logloss |
AUC |
Logloss |
SingleTask |
0.7259 |
0.1002 |
0.8737 |
0.00435 |
0.7061 |
0.1004 |
0.8637 |
0.00381 |
Shared-Bottom |
0.7245 |
0.1004 |
0.8700 |
0.00439 |
0.7029 |
0.1008 |
0.8698 |
0.00381 |
OMoE |
0.7257 |
0.1006 |
0.8781 |
0.00432 |
0.7049 |
0.1007 |
0.8701 |
0.00381 |
MMoE |
0.7216 |
0.1010 |
0.8811 |
0.00431 |
0.7043 |
0.1006 |
0.8758 |
0.00377 |
PLE |
0.7276 |
0.1014 |
0.8805 |
0.00451 |
0.7138 |
0.0992 |
0.8675 |
0.00403 |
AITM |
0.7236 |
0.1005 |
0.8763 |
0.00431 |
0.7048 |
0.1004 |
0.8730 |
0.00377 |
MetaHeac |
0.7249 |
0.1005 |
0.8813 |
0.00429 |
0.7089 |
0.1001 |
0.8743 |
0.00378 |
## File Structure
```
.
├── main.py
├── README.md
├── models
│ ├── layers.py
│ ├── aitm.py
│ ├── omoe.py
│ ├── mmoe.py
│ ├── metaheac.py
│ ├── ple.py
│ ├── singletask.py
│ └── sharedbottom.py
└── data
├── preprocess.py # Preprocess the original data
├── AliExpress_NL # AliExpressDataset from Netherlands
├── train.csv
└── test.py
├── AliExpress_ES # AliExpressDataset from Spain
├── AliExpress_FR # AliExpressDataset from French
└── AliExpress_US # AliExpressDataset from America
```
## Contact
If you have any problem about this library, please create an issue or send us an Email at:
* zhuyc0204@gmail.com
## Reference
If you use this repository, please cite the following papers:
```
@inproceedings{zhu2021learning,
title={Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising},
author={Zhu, Yongchun and Liu, Yudan and Xie, Ruobing and Zhuang, Fuzhen and Hao, Xiaobo and Ge, Kaikai and Zhang, Xu and Lin, Leyu and Cao, Juan},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={4005--4013},
year={2021}
}
```
```
@inproceedings{xi2021modeling,
title={Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising},
author={Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={3745--3755},
year={2021}
}
```
Some model implementations and util functions refers to these nice repositories.
- [pytorch-fm](https://github.com/rixwew/pytorch-fm): This package provides a PyTorch implementation of factorization machine models and common datasets in CTR prediction.
- [MetaHeac](https://github.com/easezyc/MetaHeac): This is an official implementation for Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising.