Contents
Wide&Deep Description
Model Architecture
Dataset
Environment Requirements
Quick Start
Script Description
Model Description
- Performance
Description of Random Situation
ModelZoo Homepage

Wide&Deep Description

Wide&Deep model is a classical model in Recommendation and Click Prediction area. This is an implementation of Wide&Deep as described in the Wide & Deep Learning for Recommender System paper.

Model Architecture

Wide&Deep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.

Currently we support host-device mode with multi-dimensional partition parallel for embedding table and parameter server mode, and we implement the cache mode for huge embedding table which cooperated with Noah's Ark Lab(ScaleFreeCTR).

Dataset

Criteo Kaggle Display Advertising Challenge Dataset

Environment Requirements

Hardware（Ascend or GPU）
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

Clone the Code

git clone https://gitee.com/mindspore/models.git
cd models/official/recommend/Wide_and_Deep

Download the Dataset

Please refer to 1 to obtain the download link

mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz

Use this script to preprocess the data. This may take about one hour and the generated mindrecord data is under data/mindrecord.

python src/preprocess_data.py  --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0

Start Training

Once the dataset is ready, the model can be trained and evaluated on the single device(Ascend) by the command as follows:

python train_and_eval.py --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend

On Ascend910B platform(Ascend910B1, Ascend910B2, Ascend910B3, Ascend910B4), evaluation while training is not supported. Train the model as follow:

python train.py --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend

To evaluate the model, command as follows:

python eval.py  --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend --ckpt_path=./ckpt/widedeep_train-15_2582.ckpt

Running on ModelArts (If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows)

# Train 8p on ModelArts
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on default_config.yaml file.
#          Set "run_distribute=True" on default_config.yaml file.
#          Set "data_path=/cache/data/criteo_mindrecord/" on default_config.yaml file.
#          Set other parameters on default_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "run_distribute=True" on the website UI interface.
#          Add "dataset_path=/cache/data/criteo_mindrecord/" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
# (3) Set the code directory to "/path/wide_and_deep" on the website UI interface.
# (4) Set the startup file to "train.py" on the website UI interface.
# (5) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (6) Create your job.
#
# Train 1p on ModelArts
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on default_config.yaml file.
#          Set "dataset_path='/cache/data/criteo_mindrecord/'" on default_config.yaml file.
#          Set other parameters on default_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "dataset_path=/cache/data/criteo_mindrecord/" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
# (3) Set the code directory to "/path/wide_and_deep" on the website UI interface.
# (4) Set the startup file to "train.py" on the website UI interface.
# (5) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (6) Create your job.
#
# Eval 1p on ModelArts
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on default_config.yaml file.
#          Set "ckpt_file='/cache/checkpoint_path/model.ckpt'" on default_config.yaml file.
#          Set "checkpoint_url='s3://dir_to_trained_ckpt/'" on default_config.yaml file.
#          Set "dataset_path='/cache/data/criteo_mindrecord/'" on default_config.yaml file.
#          Set other parameters on default_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "ckpt_file=/cache/checkpoint_path/model.ckpt" on the website UI interface.
#          Add "checkpoint_url=s3://dir_to_trained_ckpt/" on the website UI interface.
#          Add "dataset_path=/cache/data/criteo_mindrecord/" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
# (3) Set the code directory to "/path/wide_and_deep" on the website UI interface.
# (4) Set the startup file to "eval.py" on the website UI interface.
# (5) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (6) Create your job.

Export on ModelArts (If you want to run in modelarts, please check the official documentation of modelarts, and you can start evaluating as follows)

Export s8 multiscale and flip with voc val dataset on modelarts, evaluating steps are as follows:

# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on base_config.yaml file.
#          Set "file_name='wide_and_deep'" on base_config.yaml file.
#          Set "file_format='MINDIR'" on base_config.yaml file.
#          Set "checkpoint_url='/The path of checkpoint in S3/'" on beta_config.yaml file.
#          Set "ckpt_file='/cache/checkpoint_path/model.ckpt'" on base_config.yaml file.
#          Set other parameters on base_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "file_name='wide_and_deep'" on the website UI interface.
#          Add "file_format='MINDIR'" on the website UI interface.
#          Add "checkpoint_url='/The path of checkpoint in S3/'" on the website UI interface.
#          Add "ckpt_file='/cache/checkpoint_path/model.ckpt'" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Upload or copy your trained model to S3 bucket.
# (3) Set the code directory to "/path/wide_and_deep" on the website UI interface.
# (4) Set the startup file to "export.py" on the website UI interface.
# (5) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (6) Create your job.

Script Description

Script and Sample Code

└── wide_and_deep
    ├── eval.py
    ├── README.md
    ├── script
    │   ├── cluster_32p.json
    │   ├── common.sh
    │   ├── deploy_cluster.sh
    │   ├── run_auto_parallel_train_cluster.sh
    │   ├── run_auto_parallel_train.sh
    │   ├── run_multigpu_train.sh
    │   ├── run_multinpu_train.sh
    │   ├── run_parameter_server_train_cluster.sh
    │   ├── run_parameter_server_train.sh
    │   ├── run_standalone_train_for_gpu.sh
    │   └── start_cluster.sh
    ├── src
    │   ├── callbacks.py
    │   ├── datasets.py
    │   ├── generate_synthetic_data.py
    │   ├── __init__.py
    │   ├── metrics.py
    │   ├── preprocess_data.py
    │   ├── process_data.py
    │   ├── wide_and_deep.py
    │   └── model_utils
    │       ├── config.py                         # Processing configuration parameters
    │       ├── device_adapter.py                 # Get cloud ID
    │       ├── local_adapter.py                  # Get local ID
    │       └── moxing_adapter.py                 # Parameter processing
    ├── default_config.yaml                       # Training parameter profile
    ├── train_and_eval_auto_parallel.py
    ├── train_and_eval_distribute.py
    ├── train_and_eval_parameter_server.py
    ├── train_and_eval.py
    └── train.py
    └── export.py

Script Parameters

Training Script Parameters

The parameters is same for train.py,train_and_eval.py ,train_and_eval_distribute.py and train_and_eval_auto_parallel.py

usage: train.py [-h] [--device_target {Ascend,GPU}] [--data_path DATA_PATH]
                [--epochs EPOCHS] [--full_batch FULL_BATCH]
                [--batch_size BATCH_SIZE] [--eval_batch_size EVAL_BATCH_SIZE]
                [--field_size FIELD_SIZE] [--vocab_size VOCAB_SIZE]
                [--emb_dim EMB_DIM]
                [--deep_layer_dim DEEP_LAYER_DIM [DEEP_LAYER_DIM ...]]
                [--deep_layer_act DEEP_LAYER_ACT] [--keep_prob KEEP_PROB]
                [--dropout_flag DROPOUT_FLAG] [--output_path OUTPUT_PATH]
                [--ckpt_path CKPT_PATH] [--eval_file_name EVAL_FILE_NAME]
                [--loss_file_name LOSS_FILE_NAME]
                [--host_device_mix HOST_DEVICE_MIX]
                [--dataset_type DATASET_TYPE]
                [--parameter_server PARAMETER_SERVER]

optional arguments:
  --device_target {Ascend,GPU}        device where the code will be implemented. (Default:Ascend)
  --data_path DATA_PATH               This should be set to the same directory given to the
                                      data_download's data_dir argument
  --epochs EPOCHS                     Total train epochs. (Default:15)
  --full_batch FULL_BATCH             Enable loading the full batch. (Default:False)
  --batch_size BATCH_SIZE             Training batch size.(Default:16000)
  --eval_batch_size                   Eval batch size.(Default:16000)
  --field_size                        The number of features.(Default:39)
  --vocab_size                        The total features of dataset.(Default:200000)
  --emb_dim                           The dense embedding dimension of sparse feature.(Default:80)
  --deep_layer_dim                    The dimension of all deep layers.(Default:[1024,512,256,128])
  --deep_layer_act                    The activation function of all deep layers.(Default:'relu')
  --keep_prob                         The keep rate in dropout layer.(Default:1.0)
  --dropout_flag                      Enable dropout.(Default:0)
  --output_path                       Deprecated
  --ckpt_path                         The location of the checkpoint file. If the checkpoint file
                                      is a slice of weight, multiple checkpoint files need to be
                                      transferred. Use ';' to separate them and sort them in sequence
                                      like "./checkpoints/0.ckpt;./checkpoints/1.ckpt".
                                      (Default:./checkpoints/)
  --eval_file_name                    Eval output file.(Default:eval.og)
  --loss_file_name                    Loss output file.(Default:loss.log)
  --host_device_mix                   Enable host device mode or not.(Default:0)
  --dataset_type                      The data type of the training files, chosen from tfrecord/mindrecord/hd5.(Default:tfrecord)
  --parameter_server                  Open parameter server of not.(Default:0)
  --vocab_cache_size                  Enable cache mode.(Default:0)

Preprocess Script Parameters

usage: generate_synthetic_data.py [-h] [--output_file OUTPUT_FILE]
                                  [--label_dim LABEL_DIM]
                                  [--number_examples NUMBER_EXAMPLES]
                                  [--dense_dim DENSE_DIM]
                                  [--slot_dim SLOT_DIM]
                                  [--vocabulary_size VOCABULARY_SIZE]
                                  [--random_slot_values RANDOM_SLOT_VALUES]
optional arguments:
  --output_file                        The output path of the generated file.(Default: ./train.txt)
  --label_dim                          The label category. (Default:2)
  --number_examples                    The row numbers of the generated file. (Default:4000000)
  --dense_dim                          The number of the continue feature.(Default:13)
  --slot_dim                           The number of the category features.(Default:26)
  --vocabulary_size                    The vocabulary size of the total dataset.(Default:400000000)
  --random_slot_values                 0 or 1. If 1, the id is generated by the random. If 0, the id is set by the row_index mod           part_size, where part_size is the vocab size for each slot

usage: preprocess_data.py [-h]
                          [--data_path DATA_PATH] [--dense_dim DENSE_DIM]
                          [--slot_dim SLOT_DIM] [--threshold THRESHOLD]
                          [--train_line_count TRAIN_LINE_COUNT]
                          [--skip_id_convert {0,1}]

  --data_path                         The path of the data file.
  --dense_dim                         The number of your continues fields.(default: 13)
  --slot_dim                          The number of your sparse fields, it can also be called category features.(default: 26)
  --threshold                         Word frequency below this value will be regarded as OOV. It aims to reduce the vocab size.           (default: 100)
  --train_line_count                  The number of examples in your dataset.
  --skip_id_convert                   0 or 1. If set 1, the code will skip the id convert, regarding the original id as the final id.(default: 0)

Dataset Preparation

Process the Real World Data

Download the Dataset and place the raw dataset under a certain path, such as: ./data/origin_data

mkdir -p data/origin_data && cd data/origin_data
wget DATA_LINK
tar -zxvf dac.tar.gz

Please refer to 1 to obtain the download link

Use this script to preprocess the data

python src/preprocess_data.py  --data_path=./data/ --dense_dim=13 --slot_dim=26 --threshold=100 --train_line_count=45840617 --skip_id_convert=0

Generate and Process the Synthetic Data

The following command will generate 40 million lines of click data, in the format of

"label\tdense_feature[0]\tdense_feature[1]...\tsparse_feature[0]\tsparse_feature[1]...".

mkdir -p syn_data/origin_data
python src/generate_synthetic_data.py --output_file=syn_data/origin_data/train.txt --number_examples=40000000 --dense_dim=13 --slot_dim=51 --vocabulary_size=2000000000 --random_slot_values=0

Preprocess the generated data

python src/preprocess_data.py --data_path=./syn_data/  --dense_dim=13 --slot_dim=51 --threshold=0 --train_line_count=40000000 --skip_id_convert=1

Training Process

SingleDevice

To train and evaluate the model, command as follows:

python train_and_eval.py --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend

SingleDevice For Cache Mode

To train and evaluate the model, command as follows:

python train_and_eval.py --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend --sparse=True --vocab_size=200000 --vocab_cache_size=160000

Distribute Training

To train the model in data distributed training, command as follows:

# configure environment path before training
bash run_multinpu_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE

To train the model in model parallel training, commands as follows:

# configure environment path before training
bash run_auto_parallel_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE

To train the model in clusters, command as follows:'''

# deploy wide&deep script in clusters
# CLUSTER_CONFIG is a json file, the sample is in script/.
# EXECUTE_PATH is the scripts path after the deploy.
bash deploy_cluster.sh CLUSTER_CONFIG_PATH EXECUTE_PATH

# enter EXECUTE_PATH, and execute start_cluster.sh as follows.
# MODE: "host_device_mix"
bash start_cluster.sh CLUSTER_CONFIG_PATH EPOCH_SIZE VOCAB_SIZE EMB_DIM
                      DATASET ENV_SH RANK_TABLE_FILE MODE

Parameter Server

To train and evaluate the model in parameter server mode, command as follows:'''

# SERVER_NUM is the number of parameter servers for this task.
# SCHED_HOST is the IP address of scheduler.
# SCHED_PORT is the port of scheduler.
# The number of workers is the same as RANK_SIZE.
bash run_parameter_server_train.sh RANK_SIZE EPOCHS DATASET RANK_TABLE_FILE SERVER_NUM SCHED_HOST SCHED_PORT

Parameter Server training does not support PyNative mode.

Evaluation Process

To evaluate the model, command as follows:

python eval.py --data_path=./data/mindrecord --dataset_type=mindrecord --device_target=Ascend --ckpt_path=./ckpt/widedeep_train-15_2582.ckpt

Inference Process

Before inference, please refer to MindSpore Inference with C++ Deployment Guide to set environment variables.

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --device_target [DEVICE_TARGET] --file_format [FILE_FORMAT]

The ckpt_file parameter is required, FILE_FORMAT should be in ["AIR", "MINDIR"]

Infer

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model.

bash run_infer_cpp.sh [MINDIR_PATH] [DATASET_PATH] [DATA_TYPE] [NEED_PREPROCESS] [DEVICE_TYPE] [DEVICE_ID]

DATA_TYPE means dataset type, it's value is ['tfrecord', 'mindrecord', 'hd5'].
NEED_PREPROCESS means weather need preprocess or not, it's value is 'y' or 'n'.
DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

================================================================================ auc : 0.8080494136248402

Model Description

Performance

Training Performance

Parameters	Single Ascend	Single GPU	Data-Parallel-8P	Host-Device-mode-8P
Resource	Ascend 910; OS Euler2.8	Tesla V100-PCIE 32G	Ascend 910; OS Euler2.8	Ascend 910; OS Euler2.8
Uploaded Date	07/05/2021 (month/day/year)	07/05/2021 (month/day/year)	07/05/2021 (month/day/year)	07/05/2021 (month/day/year)
MindSpore Version	1.3.0	1.3.0	1.3.0	1.3.0
Dataset	1	1	1	1
Training Parameters	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000	Epoch=15, batch_size=16000
Optimizer	FTRL,Adam	FTRL,Adam	FTRL,Adam	FTRL,Adam
Loss Function	SigmoidCrossEntroy	SigmoidCrossEntroy	SigmoidCrossEntroy	SigmoidCrossEntroy
AUC Score	0.80937	0.80971	0.80862	0.80834
Speed	20.906 ms/step	24.465 ms/step	27.388 ms/step	236.506 ms/step
Loss	wide:0.433,deep:0.444	wide:0.444, deep:0.456	wide:0.437, deep: 0.448	wide:0.444, deep:0.444
Params(M)	75.84	75.84	75.84	75.84
Checkpoint for inference	233MB(.ckpt file)	230MB(.ckpt)	233MB(.ckpt file)	233MB(.ckpt file)

All executable scripts can be found in here

Note: The result of GPU is tested under the master version. The parameter server mode of the Wide&Deep model is still under development.

Evaluation Performance

Parameters	Wide&Deep
Resource	Ascend 910; OS Euler2.8
Uploaded Date	07/05/2021 (month/day/year)
MindSpore Version	1.3.0
Dataset	[1]
Batch Size	16000
Outputs	AUC
Accuracy	AUC=0.809

Ultimate performance experience

MindSpore support numa bind feature to get better performance from v1.1.1. Need to install numa library:

ubuntu : sudo apt-get install libnuma-dev
centos/euleros : sudo yum install numactl-devel

v1.1.1 support config interface to open numa bind feature：

import mindspore.dataset as de de.config.set_numa_enable(True)

v1.2.0 support environment variable further to open numa bind feature：

export DATASET_ENABLE_NUMA=True

Description of Random Situation

There are three random situations:

Shuffle of the dataset.
Initialization of some model weights.
Dropout operations.

ModelZoo Homepage

Please check the official homepage.

MindSpore/models

Contents

Wide&Deep Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Script Parameters

Preprocess Script Parameters

Dataset Preparation

Process the Real World Data

Generate and Process the Synthetic Data

Training Process

SingleDevice

SingleDevice For Cache Mode

Distribute Training

Parameter Server

Evaluation Process

Inference Process

Export MindIR

Infer

result

Model Description

Performance

Training Performance

Evaluation Performance

Ultimate performance experience

Description of Random Situation

ModelZoo Homepage

About

Releases

Contributors

Language(Optional)

Activities

MindSpore/models .gitee-modal { width: 500px !important; }

Contents

Inference Process

Infer

result

Training Performance

Evaluation Performance

Ultimate performance experience

About

Releases

The Open Source Evaluation Index is derived from the OSS Compass evaluation system, which evaluates projects around the following three dimensions

Contributors

Language(Optional)

Activities

Search

MindSpore/models