DAM Description
Model Architecture
Dataset
- Download and unzip the dataset
- Prepare the mindrecord file
Environmental Requirements
Script Description
Model Description
Description of Random Situation
ModelZoo Homepage

DAM Description

Human generates responses relying on semantic and functional dependencies, including coreference relation, among dialogue elements and their context. In this paper, we investigate matching a response with its multi-turn context using dependency information based entirely on attention. Our solution is inspired by the recently proposed Transformer in machine translation (V aswani et al., 2017) and we extend the attention mechanism in two ways. First, we construct representations of text segments at different granularities solely with stacked self-attention. Second, we try to extract the truly matched segment pairs with attention across the context and response. We jointly introduce those two kinds of attention in one uniform neural network. Experiments on two large-scale multi-turn response selection tasks show that our proposed model significantly outperforms the state-of-the-art models.

Paper: Zhou, Xiangyang , et al. "Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2018.

Model Architecture

The pyramid pooling module fuses features under four different pyramid scales.For maintaining a reasonable gap in representation，the module is a four-level one with bin sizes of 1×1, 2×2, 3×3 and 6×6 respectively.

Dataset

Train DAM Dataset used: Ubuntu Corpus and Douban Corpus

Note: The package contains the Ubuntu and Douban datasets
Note: Data will be processed in src/data2mindrecord.py

Both the Ubuntu and Douban training set contains 0.5 million multiturn contexts, and each context has one positive response and one negative response. Both valiation and testing sets of Ubuntu Corpus have 50k contexts, where each context is provided with one positive response and nine negative replies. The validation set of Douban corpus contains 50K instances, and the test set contains 10K instances.

Download and unzip the dataset

├── data
  ├── douban
  └── ubuntu

Prepare the mindrecord file

Using data2mindrecord.py to preprocess the dataset as follows.

├─ PATH_TO_OUTPUT_MINDRECORD
  ├─ douban
  │  ├─ data_train.mindrecord
  │  ├─ data_train.mindrecord.db
  │  ├─ data_val.mindrecord
  │  ├─ data_val.mindrecord.db
  │  ├─ data_test.mindrecord
  │  └─ data_test.mindrecord.db
  └─ ubuntu
     ├─ data_train.mindrecord
     ├─ data_train.mindrecord.db
     ├─ data_val.mindrecord
     ├─ data_val.mindrecord.db
     ├─ data_test.mindrecord
     └─ data_test.mindrecord.db

Training set

python data2mindrecord.py --device_id=[DEVICE_ID] \
                          --data_name=ubuntu \
                          --data_root=[DATA_ROOT] \
                          --raw_data=data.pkl \
                          --mode=train \
                          --print_data=True

Validation set

python data2mindrecord.py --device_id=[DEVICE_ID] \
                          --data_name=ubuntu \
                          --data_root=[DATA_ROOT] \
                          --raw_data=data.pkl \
                          --mode=val \
                          --print_data=True

Testing set

python data2mindrecord.py --device_id=[DEVICE_ID] \
                          --data_name=ubuntu \
                          --data_root=[DATA_ROOT] \
                          --raw_data=data.pkl \
                          --mode=test \
                          --print_data=True

Environmental requirements

Hardware (Ascend)
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Script Description

Script and Sample Code

├─ dam
  ├─ README.md                      # descriptions about DAM
  ├─ requirements.txt
  ├─ data2mindrecord.py             # convert dataset to mindrecord
  ├─ scripts
  │  ├─ run_distribute_train.sh     # launch distributed ascend training(8 pcs)
  │  ├─ run_train.sh                # launch ascend training(1 pcs)
  │  └─ run_eval.sh                 # launch ascend eval
  ├─ src
  │  ├─ __init__.py                 # init file
  │  ├─ callback.py                 # define callback function
  │  ├─ config.py                   # config file
  │  ├─ douban_evaluation.py        # evaluation function of Douban data
  │  ├─ dynamic_lr.py               # generative learning rate
  │  ├─ ubuntu_evaluation.py        # evaluation function of Ubuntu data
  │  ├─ layers.py                   # network module of DAM
  │  ├─ metric.py                   # verify the model
  │  ├─ net.py                      # DAM network
  │  └─ utils.py                    # network utils file
  ├─ train.py                       # train DAM script
  ├─ eval.py                        # evaluate script
  └─ export.py                      # export mindir script

Script Parameters

Parameters for both training and evaluation can be set in config.py.

Major hyper-parameters are as follows:

"seed": 1                                   # random seed
"parallel": False                           # Whether to use parallel mode for training.
"do_eval": True                             # infer while training
"max_turn_num": 9                           # Maximum number of utterances in context.
"max_turn_len": 50                          # Maximum length of setences in turns.
"stack_num": 5                              # The number of stacked attentive modules in network.
"attention_type": "dot"                     # attention type
"is_emb_init":False                         # Whether to use a pre-trained embedding file.
"vocab_size": 434512                        # The size of vocabulary. --172130 for douban data--
"emb_size": 200                             # The dimension of word embedding.
"channel1_dim": 32                          # he channels' number of the 1st conv3d layer's output. --16 for douban data--
"channel2_dim": 16                          # The channels' number of the 2nd conv3d layer's output.
"is_mask": True                             # use mask
"is_layer_norm": True                       # use layer normal
"is_positional": False                      # use positional code
"drop_attention": None                      # attention module use dropout
"batch_size": 256                           # Batch size for training.
"eval_batch_size": 200                      # Batch size for testing.
"learning_rate": 1e-3                       # Learning rate used to train.
"decay_rate": 0.9                           # learning rate decay rate
"decay_steps": 405                          # learning rate decay step
"loss_scale": 1                             # loss scale
"epoch_size": 2                             # Number of pass for training.
"modelArts"': False                         # whether training on modelArts

Training Process

Training

launch distributed training on ModelArts (If you want to run in ModelArts, please check the official documentation of modelarts).

ModelArts parameters settings：

    --train_url=/PATH/TO/OUTPUT_DIR \
    --data_url=/PATH/TO/MINDRECORD  \
    --modelArts_mode=True  \
    --model_name=DAM_ubuntu  \
    --epoch_sie=2 \
    --batch_size=256 \
    --learning_rate=0.001 \
    --decay_steps=405 \

launch training on Ascend with single device

bash scripts/run_train.sh [DEVICE_ID] [MODEL_NAME ] [BATCH_SIZE] [EPOCH_SIZE] [LEARNING_RATE] [DECAY_STEPS] [DATA_ROOT] [OUTPUT_PATH]

launch distributed trainning on Ascend (8pcs)

bash scripts/run_distribute_train.sh [RANK_SIZE] [RANK_TABLE_FILE] [MODEL_NAME ] [BATCH_SIZE] [EPOCH_SIZE] [LEARNING_RATE] [DECAY_STEPS] [DATA_ROOT]

Training Result

The python command above will run in the background, you can view the results through the file eval.log.

After training, you'll get some checkpoint files under ./output/ubuntu/ folder by default. The loss value are saved in loss.log file.

# training result(8p)-ubuntu
step time 197.62039184570312
step time 197.71790504455566
epoch: 1 step: 3853 global_step: 3853, loss is 0.21962138
epoch: 1 step: 3853 global_step: 3853, loss is 0.21994969
epoch: 1 step: 3853 global_step: 3853, loss is 0.32234603
epoch: 1 step: 3853 global_step: 3853, loss is 0.37376451
epoch: 1 step: 3853 global_step: 3853, loss is 0.5122621
epoch: 1 step: 3853 global_step: 3853, loss is 0.20732686
step time 197.07393646240234
step time 197.10779190063477
step time 197.42536544799805
step time 197.4952220916748
step time 197.47066497802734
epoch: 1 step: 3854 global_step: 3854, loss is 0.2575438
epoch: 1 step: 3854 global_step: 3854, loss is 0.29517844
epoch: 1 step: 3854 global_step: 3854, loss is 0.17604485
epoch: 1 step: 3854 global_step: 3854, loss is 0.22759959
epoch: 1 step: 3854 global_step: 3854, loss is 0.43964553
step time 197.8461742401123
step time 198.20117950439453
step time 198.29559326171875
epoch: 1 step: 3854 global_step: 3854, loss is 0.2520399
epoch: 1 step: 3854 global_step: 3854, loss is 0.3967452
epoch: 1 step: 3854 global_step: 3854, loss is 0.3976175

Evaluation Process

Evaluation

Evaluation on dataset when running on Ascend.

Before running the command below, please check the checkpoint path used for evaluation.

bash scripts/run_eval.sh [DEVICE_ID] [MODEL_NAME ] [EVAL_BATCH_SIZE] [DATA_ROOT] [CKPT_PATH] [CKPT_NAME] [OUTPUT_PATH]

Evaluation Result

The results were as follows:

Ubuntu:R2@1/R10@1/R10@2/R10@5  0.937/0.765/0.870/0.968
Douban:MAP/MRR/P@1/R10@1/R10@2/R10@5  0.550/0.601/0.427/0.254/0.410/0.757

Model Export

python export.py --model_name [MODEL_NAME] --ckpt_path [CKPT_PATH] --ckpt_name [CKPT_NAME] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT] --batch_size [BATCH_SIZE]

MODEL_NAME can be in ["DAM_ubuntu", "DAM_douban"] EXPORT_FORMAT can be in ["AIR", "MINDIR"] BATCH_SIZE default value of ubuntu is 200， default value of douban is 256.

Inference Process

Before inference, please refer to MindSpore Inference with C++ Deployment Guide to set environment variables.

Usage

Before performing inference, the model file must be exported by export script on the Ascend910 environment.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [MODEL_NAME] [DATA_FILE_PATH] [EVAL_BATCH_SIZE] [NEED_PREPROCESS] [DEVICE_ID]（optional）

DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

Ubuntu: accuracy: (0.93736, 0.75458, 0.87044, 0.96802)
Douban: accuracy: (0.5479, 0.5967, 0.4227, 0.2504, 0.4167, 0.7552)

Model Description

Performance

Distributed Training Performance

Parameters	DAM	DAM
Resource	Ascend 910 * 8; CPU 2.60GHz, 192cores; Memory 755G	Ascend 910 * 8; CPU 2.60GHz, 192cores; Memory 755G
MindSpore Version	1.3.0	1.3.0
Dataset	Ubuntu	Douban
Training Parameters	epoch=2, batch_size = 256, learning_rate=1e-3, decay_steps=405	epoch=2, batch_size = 256, learning_rate=1e-3, decay_steps=405
Optimizer	Adam	Adam
Loss Function	SigmoidCrossEntropyWithLogits	SigmoidCrossEntropyWithLogits
Outputs	score	score
Accuracy	0.937/0.765/0.870/0.968 (Ubuntu)	0.550/0.601/0.427/0.254/0.410/0.757 (Douban)
Speed	197.425 ms/step (8pcs);	197.425 ms/step (8pcs);
Total time	4h52min (8pcs)	13h4m (8pcs)
Checkpoint	1010.56 M (.ckpt file)	410 M (.ckpt file)

Description of Random Situation

The random seed in train.py.

ModelZoo Homepage

Please visit the official website homepage.

MindSpore/models

Contents

DAM Description

Model Architecture

Dataset

Download and unzip the dataset

Prepare the mindrecord file

Environmental requirements

Script Description

Script and Sample Code

Script Parameters

Training Process

Training

Training Result

Evaluation Process

Evaluation

Evaluation Result

Model Export

Inference Process

Usage

result

Model Description

Performance

Distributed Training Performance

Description of Random Situation

ModelZoo Homepage

简介

发行版

贡献者

语言

近期动态

MindSpore/models .gitee-modal { width: 500px !important; }

Contents

Download and unzip the dataset

Prepare the mindrecord file

Training

Training Result

Evaluation

Evaluation Result

Model Export

Inference Process

Usage

result

Distributed Training Performance

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

语言

近期动态

搜索帮助

MindSpore/models