With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design.
pip3 install -r requirements.txt && python3 ./setup.py install
Criteo_Terabyte consists of 23 days data, as it is very large, here only take 3 days data for an example.
# Check gzip version
gzip -V
# If gzip version is not 1.6+, you need to install gzip 1.6
wget https://ftp.gnu.org/gnu/gzip/gzip-1.6.tar.gz
tar -xzf gzip-1.6.tar.gz
cd gzip-1.6
./configure && make install
cd ../
rm -rf gzip-1.6.tar.gz gzip-1.6/
# Download data
cd dlrm/data/
bash download_and_preprocess.sh
After above steps, you can get files: terabyte_processed_test.bin, terabyte_processed_train.bin, terabyte_processed_val.bin in "/home/datasets/recommendation/Criteo_Terabyte/".
python3 -u scripts/train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 1card.txt
python3 -u -m torch.distributed.launch --nproc_per_node=8 --use_env scripts/dist_train.py --model_config dlrm/config/official_config.json --dataset /home/datasets/recommendation/Criteo_Terabyte --lr 0.1 --warmup_steps 2750 --decay_end_lr 0 --decay_steps 27772 --decay_start_step 49315 --batch_size 2048 --epochs 5 |& tee 8cards.txt
GPUs | FPS | AUC |
---|---|---|
1x1 | 196958 | N/A |
1x8 | 346555 | 0.75 |
Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability |
---|---|---|---|---|---|---|---|
AUC:0.75 | SDK V2.2,bs:2048,8x,AMP | 793486 | 0.75 | 60*8 | 0.97 | 3.7*8 | 1 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。