diff --git a/research/cv/TCN/README.md b/research/cv/TCN/README.md index c9a4aae5444a7b8a268b30969f10a529309f0c5c..076f09f0fbc2ff3de99ab206b4e3a3d1b959625e 100644 --- a/research/cv/TCN/README.md +++ b/research/cv/TCN/README.md @@ -1,292 +1,303 @@ -# 目录 +# Table of contents +<<<<<<< HEAD +======= +>>>>>>> origin/master -- [目录](#目录) - - [TCN描述](#tcn描述) - - [模型架构](#模型架构) - - [数据集](#数据集) - - [环境要求](#环境要求) - - [快速入门](#快速入门) - - [脚本说明](#脚本说明) - - [脚本及样例代码](#脚本及样例代码) - - [脚本参数](#脚本参数) - - [训练过程](#训练过程) - - [训练](#训练) - - [评估过程](#评估过程) - - [评估](#评估) - - [推理过程](#推理过程) - - [导出MindIR](#导出mindir) - - [在Ascend310执行推理](#在ascend310执行推理) - - [模型描述](#模型描述) - - [性能](#性能) - - [随机情况说明](#随机情况说明) - - [ModelZoo主页](#modelzoo主页) +- [directory](#directory) + - [TCN description] (#tcn description) + - [model-architecture](#model-architecture) + - [dataset](#dataset) + - [environmental requirements] (#environmental requirements) + - [Quickstart](#Quickstart) + - [Script description](#Script description) + - [Script and Sample Code](#Script and Sample Code) + - [script parameters](#script parameters) + - [training process](#training process) + - [training](#training) + - [evaluation process](#evaluation process) + - [assessment](#assessment) + - [inference process](#inference process) + - [Export MindIR](#Export mindir) + - [Execute reasoning at Ascend310](#Execute reasoning at ascend310) + - [model description](#model description) + - [performance](#performance) + - [Random Situation Description](#Random Situation Description) + - [ModelZoo homepage] (#modelzoo homepage) -## TCN描述 +##TCN description -TCN是一种特殊的卷积神经网络——时序卷积网络(Temporal convolutional network, TCN),于2018年被提出。相较于经典的时序模型RNN结构,TCN模型拥有较高的并行性、更加灵活的感受野,稳定的梯度和更小的内存消耗等优点,在多个时序问题上表现优异。 +TCN is a special convolutional neural network - Temporal convolutional network (TCN), which was proposed in 2018. Compared with the classical time series model RNN structure, the TCN model has the advantages of higher parallelism, more flexible receptive field, stable gradient and smaller memory consumption, and performs well in multiple time series problems. -[论文](https://arxiv.org/pdf/1803.01271.pdf)An Empirical Evaluation of Generic Convolutional and Recurrent Networks +[Paper](https://arxiv.org/pdf/1803.01271.pdf) An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling -## 模型架构 +## Model Architecture -## 数据集 +## data set -数据集:[Permuted MNIST]() +Dataset: [Permuted MNIST]() -- 数据集大小:52.4M,共10个类,6万张 28*28图像 - - 训练集:6万张图像 - - 测试集:5万张图像 -- 数据格式:二进制文件 - - 注:数据在dataset.py中处理。 +- Dataset size: 52.4M, a total of 10 classes, 60,000 28*28 images + - Training set: 60k images + - Test set: 50,000 images +- Data format: binary file + - Note: Data is processed in dataset.py. -- 目录结构如下: +- The directory structure is as follows: ```bash └─data └─MNIST ├─test - │ ├─t10k-images.idx3-ubyte - │ └─t10k-labels.idx1-ubyte + │ ├─t10k-images.idx3-ubyte + │ └─t10k-labels.idx1-ubyte └─train ├─train-images.idx3-ubyte └─train-labels.idx1-ubyte -``` +```` -数据集:Adding Problem +Dataset: Adding Problem -- 数据集描述: +- Dataset description: - 在该任务中,每个输入由深度为2的长度T序列组成,所有值在维度1的[0,1]中随机选择。第二个维度由除两个元素外的所有零组成,这两个元素用1标记。目标是将第二维度标记为1的两个随机值相加。我们可以把它看作是计算二维的点积。 - 简单预测总和为1时,MSE应为0.1767左右。 + In this task, each input consists of a sequence of length T of depth 2, with all values ​​randomly chosen in [0,1] of dimension 1. The second dimension consists of all zeros except two elements, which are marked with 1s. The goal is to add two random values ​​with the second dimension labeled 1. We can think of it as computing a 2D dot product. + When the simple predictions sum to 1, the MSE should be around 0.1767. - 注:因为TCN的感受野取决于网络的深度和滤波器的大小,我们需要确保我们使用的模型能够覆盖序列长度T。 + Note: Because the receptive field of TCN depends on the depth of the network and the size of the filter, we need to ensure that the model we use can cover the sequence length T. -- 数据处理: +- data processing: - 可以使用create_datasetAP.py文件用于生成训练集和测试集,并以bin文件格式保存在`../data/AddProb`目录下。 + The create_datasetAP.py file can be used to generate training and test sets, and save them in the `../data/AddProb` directory in bin file format. - 文件datasetAP.py用于读取已经生成的测试集和训练集。 + The file datasetAP.py is used to read the generated test and training sets. -## 环境要求 +## Environmental requirements -- 硬件(Ascend/GPU) - - 准备Ascend或GPU处理器搭建硬件环境。 -- 框架 +- Hardware (Ascend/GPU) + - Prepare Ascend or GPU processor to build hardware environment. +- frame - [MindSpore](https://www.mindspore.cn/install) - 如需查看详情,请参见如下资源: - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) - - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/zh-CN/master/index.html) -## 快速入门 +## Quick start -通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估: +After installing MindSpore through the official website, you can follow the steps below for training and evaluation: -```python -# 进入脚本目录,训练TCN -bash -run_train_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [TEST_PATH] [CKPT_PATH] +```bash +# Enter the script directory to train TCN +bash run_train_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [TEST_PATH] [CKPT_PATH] # example: bash run_train_ascend.sh permuted_mnist ../data/MNIST/train ../data/MNIST/test ../checkpoint_path -# 进入脚本目录,评估TCN -bash -run_eval_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [CKPT_FILE] +# Enter the script directory and train TCN with standard parameters +bash run_train_standalone_gpu.sh [permuted_mnist | adding_problem] [DATA_PATH] [TEST_PATH] [CKPT_PATH] + +# Enter the script directory and evaluate the TCN +bash run_eval_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [CKPT_FILE] # example: bash run_eval_ascend.sh permuted_mnist ../data/MNIST/test ../checkpoint_tcn-30_937.ckpt -``` -- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/)) +# Enter script directory, evaluate TCN (GPU) +bash run_eval_gpu.sh [permuted_mnist | adding_problem] [DATA_PATH] [CKPT_FILE] +```` + +- Train on ModelArts (if you want to run on modelarts, you can refer to the following documentation [modelarts](https://support.huaweicloud.com/modelarts/)) ```bash - # 在 ModelArts 上使用单卡训练permuted_mnist数据集 - # (1) 执行a或者b - # a. 在 default_config.yaml 文件中设置 "enable_modelarts=True" - # 在 default_config.yaml 文件中设置 "data_path='/cache/data'" - # 在 default_config.yaml 文件中设置 "train_data_path='/cache/data/MNIST/train'" - # 在 default_config.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" - # 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'" - # (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在 default_config.yaml 文件中设置 其他参数 - # b. 在网页上设置 "enable_modelarts=True" - # 在网页上设置 "train_data_path='/cache/data/MNIST/train'" - # 在网页上设置 "test_data_path='/cache/data/MNIST/test'" - # 在网页上设置 "data_path='/cache/data'" - # 在网页上设置 "ckpt_path='/cache/train'" - # (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在网页上设置 其他参数 - # (2) 准备模型代码 - # (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上 - # (4) 上传原始 MNIST 数据集到 S3 桶上 - # (5) 在网页上设置你的代码路径为 "/path/tcn" - # (6) 在网页上设置启动文件为 "train.py" - # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 - # (8) 创建训练作业 + # Train the permuted_mnist dataset with a single card on ModelArts + # (1) Execute a or b + # a. Set "enable_modelarts=True" in the default_config.yaml file + # Set "data_path='/cache/data'" in default_config.yaml file + # Set "train_data_path='/cache/data/MNIST/train'" in default_config.yaml file + # Set "test_data_path='/cache/data/MNIST/test'" in default_config.yaml file + # Set "ckpt_path='/cache/train'" in default_config.yaml file + # (optional) set "checkpoint_url='s3://dir_to_your_pretrained/'" in default_config.yaml file + # Set other parameters in default_config.yaml file + # b. Set "enable_modelarts=True" on the web page + # Set "train_data_path='/cache/data/MNIST/train'" on the web page + # Set "test_data_path='/cache/data/MNIST/test'" on the web page + # Set "data_path='/cache/data'" on the web page + # Set "ckpt_path='/cache/train'" on the web page + # (optional) set "checkpoint_url='s3://dir_to_your_pretrained/'" on the web page + # Set other parameters on the web page + # (2) Prepare model code + # (3) If you choose to fine-tune your model, upload your pre-trained model to the S3 bucket + # (4) Upload the original MNIST dataset to the S3 bucket + # (5) Set your code path on the web page to "/path/tcn" + # (6) Set the startup file as "train.py" on the web page + # (7) Set "training data set", "training output file path", "job log path", etc. on the web page + # (8) Create a training job # - # 在 ModelArts 上使用单卡验证permuted_mnist数据集 - # (1) 执行a或者b - # a. 在 default_config.yaml 文件中设置 "enable_modelarts=True" - # 在 default_config.yaml 文件中设置 "data_path='/cache/data'" - # 在 default_config.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" - # 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在 default_config.yaml 文件中设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" - # 在 default_config.yaml 文件中设置 其他参数 - # b. 在网页上设置 "enable_modelarts=True" - # 在网页上设置 "data_path='/cache/data'" - # 在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在网页上设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" - # 在网页上设置 其他参数 - # (2) 准备模型代码 - # (3) 上传你训练好的模型到 S3 桶上 - # (4) 上传原始 MNIST 数据集到 S3 桶上 - # (5) 在网页上设置你的代码路径为 "/path/tcn" - # (6) 在网页上设置启动文件为 "eval.py" - # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 - # (8) 创建训练作业 - ``` + # Validate the permuted_mnist dataset with a single card on ModelArts + # (1) Execute a or b + # a. Set "enable_modelarts=True" in the default_config.yaml file + # Set "data_path='/cache/data'" in default_config.yaml file + # Set "test_data_path='/cache/data/MNIST/test'" in default_config.yaml file + # Set "checkpoint_url='s3://dir_to_your_pretrained/'" in default_config.yaml file + # Set "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" in default_config.yaml file + # Set other parameters in default_config.yaml file + # b. Set "enable_modelarts=True" on the web page + # Set "data_path='/cache/data'" on the web page + # Set "checkpoint_url='s3://dir_to_your_pretrained/'" on the webpage + # Set "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" on the web page + # Set other parameters on the web page + # (2) Prepare model code + # (3) Upload your trained model to the S3 bucket + # (4) Upload the original MNIST dataset to the S3 bucket + # (5) Set your code path on the web page to "/path/tcn" + # (6) Set the startup file to "eval.py" on the web page + # (7) Set "training data set", "training output file path", "job log path", etc. on the web page + # (8) Create a training job + ```` ```bash - # 在 ModelArts 上使用单卡训练adding_problem数据集 - # (1) 执行a或者b - # a. 在 config_addingproblem.yaml 文件中设置 "enable_modelarts=True" - # 在 config_addingproblem.yaml 文件中设置 "data_path='/cache/data'" - # 在 config_addingproblem.yaml 文件中设置 "train_data_path='/cache/data/AddProb/train'" - # 在 config_addingproblem.yaml 文件中设置 "test_data_path='/cache/data/AddProb/test'" - # 在 config_addingproblem.yaml 文件中设置 "ckpt_path='/cache/train'" - # (可选)在 config_addingproblem.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在 config_addingproblem.yaml 文件中设置 其他参数 - # b. 在网页上设置 "enable_modelarts=True" - # 在网页上设置 "train_data_path='/cache/data/AddProb/train'" - # 在网页上设置 "test_data_path='/cache/data/AddProb/test'" - # 在网页上设置 "data_path='/cache/data'" - # 在网页上设置 "ckpt_path='/cache/train'" - # (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在网页上设置 其他参数 - # (2) 准备模型代码 - # (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上 - # (4) 生成原始 AddingProblem 数据集到 S3 桶上 - # (5) 在网页上设置你的代码路径为 "/path/tcn" - # (6) 在网页上设置启动文件为 "train.py" - # (7) 在网页上设置config_path为"../../config_addingproblem.yaml" - # (8) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 - # (9) 创建训练作业 + # Train adding_problem dataset with single card on ModelArts + # (1) Execute a or b + # a. Set "enable_modelarts= in the config_addingproblem.yaml file True" + # Set "data_path='/cache/data'" in config_addingproblem.yaml file + # Set "train_data_path='/cache/data/AddProb/train'" in config_addingproblem.yaml file + # Set "test_data_path='/cache/data/AddProb/test'" in config_addingproblem.yaml file + # Set "ckpt_path='/cache/train'" in config_addingproblem.yaml file + # (optional) set "checkpoint_url='s3://dir_to_your_pretrained/'" in config_addingproblem.yaml file + # Set other parameters in the config_addingproblem.yaml file + # b. Set "enable_modelarts=True" on the web page + # Set "train_data_path='/cache/data/AddProb/train'" on the web page + # Set "test_data_path='/cache/data/AddProb/test'" on the web page + # Set "data_path='/cache/data'" on the web page + # Set "ckpt_path='/cache/train'" on the web page + # (optional) set "checkpoint_url='s3://dir_to_your_pretrained/'" on the web page + # Set other parameters on the web page + # (2) Prepare model code + # (3) If you choose to fine-tune your model, upload your pre-trained model to the S3 bucket + # (4) Generate the original AddingProblem dataset to the S3 bucket + # (5) Set your code path on the web page to "/path/tcn" + # (6) Set the startup file as "train.py" on the web page + # (7) Set config_path to "../../config_addingproblem.yaml" on the web page + # (8) Set "training data set", "training output file path", "job log path", etc. on the web page + # (9) Create training job # - # 在 ModelArts 上使用单卡验证adding_problem数据集 - # (1) 执行a或者b - # a. 在 config_addingproblem.yaml 文件中设置 "enable_modelarts=True" - # 在 config_addingproblem.yaml 文件中设置 "data_path='/cache/data'" - # 在 config_addingproblem.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" - # 在 config_addingproblem.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在 config_addingproblem.yaml 文件中设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" - # 在 config_addingproblem.yaml 文件中设置 其他参数 - # b. 在网页上设置 "enable_modelarts=True" - # 在网页上设置 "data_path='/cache/data'" - # 在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" - # 在网页上设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" - # 在网页上设置 其他参数 - # (2) 准备模型代码 - # (3) 上传你训练好的模型到 S3 桶上 - # (4) 生成原始 AddingProblem 数据集到 S3 桶上 - # (5) 在网页上设置你的代码路径为 "/path/TCN" - # (6) 在网页上设置启动文件为 "eval.py" - # (7) 在网页上设置config_path为"../../config_addingproblem.yaml" - # (8) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 - # (9) 创建训练作业 - ``` - -## 脚本说明 - -### 脚本及样例代码 + # Validate adding_problem dataset with single card on ModelArts + # (1) Execute a or b + # a. Set "enable_modelarts=True" in the config_addingproblem.yaml file + # Set "data_path='/cache/data'" in config_addingproblem.yaml file + # Set "test_data_path='/cache/data/MNIST/test'" in config_addingproblem.yaml file + # Set "checkpoint_url='s3://dir_to_your_pretrained/'" in config_addingproblem.yaml file + # Set "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" in config_addingproblem.yaml file + # Set other parameters in the config_addingproblem.yaml file + # b. Set "enable_modelarts=True" on the web page + # Set "data_path='/cache/data'" on the web page + # Set "checkpoint_url='s3://dir_to_your_pretrained/'" on the webpage + # Set "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" on the web page + # Set other parameters on the web page + # (2) Prepare model code + # (3) Upload your trained model to the S3 bucket + # (4) Generate the original AddingProblem dataset to the S3 bucket + # (5) Set your code path on the web page to "/path/TCN" + # (6) Set the startup file to "eval.py" on the web page + # (7) Set config_path to "../../config_addingproblem.yaml" on the web page + # (8) Set "training data set", "training output file path", "job log path", etc. on the web page + # (9) Create training job + ```` + +## script description + +### Script and sample code ```bash ├── TCN - ├── README.md // TCN相关说明 - ├── ascend310 // 实现310推理源代码 + ├── README.md // TCN related instructions + ├── ascend310 // Implement 310 inference source code ├── scripts - │ ├──run_train_ascend.sh // 在Ascend中训练的脚本 - │ ├──run_eval_ascend.sh // 在Ascend中评估的脚本 - │ ├──run_infer_310.sh // 在310中进行离线推理的脚本 + │ ├──run_train_ascend.sh // Script trained in Ascend + │ ├──run_train_gpu.sh // Script to train in GPU + │ ├──run_train_standalone.sh // GPU training with default settings + │ ├── run_eval_ascend.sh // Script evaluated in Ascend + │ ├──run_eval_gpu.sh // Script to evaluate in GPU + │ ├── run_eval_standalone.sh // Script is evaluated in GPU with standard parameters + │ ├──run_infer_310.sh // Script for offline inference in 310 ├── src - │ ├──dataset.py // 创建MNIST数据集 - │ ├──create_datasetAP.py // 生成AddingProblem数据集 - │ ├──datasetAP.py // 读取AddingProblem数据集 - │ ├──TCN.py // TCN主要架构 - │ ├──model.py // 为了适应MNIST数据的模型 - │ ├──metric.py // 自定义模型评价标准 - │ ├──weight_norm.py // 权重归一化 - │ ├──loss.py // 损失 - │ ├──lr_generator.py // 动态学习率 - │ └──model_utils - │ ├──config.py // 训练配置 - │ ├──device_adapter.py // 获取云上id - │ ├──local_adapter.py // 获取本地id - │ └──moxing_adapter.py // 参数处理 - ├── default_config.yaml // MNIST数据集训练参数配置文件 - ├── config_addingproblem.yaml // AddingProblem数据集训练参数配置文件 - ├── train.py // 训练脚本 - ├── eval.py // 评估脚本 - ├── export.py // 导出脚本 - ├── postprocess.py // 310推理后处理脚本 - ├── preprocess.py // 310推理前处理脚本 - ├── requirements.txt // 所需要的python库 -``` - -### 脚本参数 + │ ├──dataset.py // Create MNIST dataset + │ ├──create_datasetAP.py // Generate AddingProblem dataset + │ ├──datasetAP.py // Read the AddingProblem dataset + │ ├──TCN.py // TCN main structure + │ ├──model.py // Model to fit MNIST data + │ ├──metric.py // Custom model evaluation criteria + │ ├──weight_norm.py // weight normalization + │ ├──loss.py // loss + │ ├──lr_generator.py // Dynamic learning rate + │ └──model_utils + │ ├──config.py // training configuration + │ ├──device_adapter.py // Get the id on the cloud + │ ├──local_adapter.py // Get the local id + │ └──moxing_adapter.py // Parameter processing + ├── default_config.yaml // MNIST dataset training parameter configuration file + ├── config_addingproblem.yaml // AddingProblem dataset training parameter configuration file + ├── train.py // training script + ├── eval.py // evaluation script + ├── export.py // export script + ├── postprocess.py // 310 inference post-processing script + ├── preprocess.py // 310 inference preprocessing script + ├── requirements.txt // required python libraries +```` + +### Script parameters ```bash -train.py和config.py中主要参数如下: ---enable_modelarts:允许云上适配。 ---train_data_path:到训练的路径。 ---test_data_path:到评估的路径。 ---output_path:保存checkpoint文件的路径。 ---load_path:加载checkpoint文件。 ---checkpoint_path:训练后保存的检查点文件的绝对完整路径。 ---data_path:数据集所在路径。 ---device_target:实现代码的设备。 ---epoch_size:总训练轮次。 ---epoch_change:学习率发生变化的轮次。 ---batch_size:训练批次大小。 ---image_height:图像高度作为模型输入(仅在permuted mnist数据集)。 ---image_width:图像宽度作为模型输入(仅在permuted mnist数据集)。 ---dataset_name:数据集名称。可选值为"permuted_mnist"和"adding_problem"。 ---channel_size:输入通道数。 ---num_classes:类别个数。 ---lr:学习率。 ---batch_train:运行训练集的batch size。 ---batch_test:运行测试集的batch size。 ---dropout:dropout大小。 ---kernel_size:卷积核大小。 ---level:TCN的层数。 ---nhid:每一层的节点数目。 ---save_checkpoint_steps:间隔多少个step保存checkpoint文件。 ---keep_checkpoint_max:最多保存checkpoint文件的数目。 ---N_train:adding problem数据集训练集大小(仅在adding problem数据集)。 ---N_test:adding problem数据集测试集大小(仅在adding problem数据集) 。 ---seq_length:adding problem数据的序列长度(仅在adding problem数据集)。 ---device_id:运行设备id。 ---file_name:保存MINDIR文件的名称。 ---file_format:默认为"MINDIR”。 - -``` - -### 训练过程 - -#### 训练 - -- Ascend处理器环境运行permuted_mnist数据集 +The main parameters in train.py and config.py are as follows: +--enable_modelarts: Allow on-cloud adaptation. +--train_data_path: Path to training. +--test_data_path: Path to evaluation. +--output_path: The path to save the checkpoint file. +--load_path: Load the checkpoint file. +--checkpoint_path: Absolute full path to the checkpoint file saved after training. +--data_path: The path where the dataset is located. +--device_target: The device on which the code is implemented. +--epoch_size: Total training epochs. +--epoch_change: The epoch at which the learning rate changes. +--batch_size: Training batch size. +--image_height: Image height as model input (only in permuted mnist dataset). +--image_width: Image width as model input (only in permuted mnist dataset). +--dataset_name: The dataset name. The optional values ​​are "permuted_mnist" and "adding_problem". +--channel_size: Number of input channels. +--num_classes: The number of classes. +--lr: Learning rate. +--batch_train: The batch size of the training set to run. +--batch_test: The batch size to run the test set. +--dropout: dropout size. +--kernel_size: convolution kernel size. +--level: The level of TCN. +--nhid: The number of nodes in each layer. +--save_checkpoint_steps: How many steps to save the checkpoint file at. +--keep_checkpoint_max: The maximum number of checkpoint files to save. +--N_train: The size of the training set for the adding problem dataset (only in the adding problem dataset). +--N_test:adding problem dataset test set size (only in adding problem dataset). +--seq_length: Sequence length of adding problem data (only in adding problem data set). +--device_id: Running device id. +--file_name: The name of the saved MINDIR file. +--file_format: Defaults to "MINDIR". + +```` + +### Training process + +#### train + +- Ascend processor environment to run permuted_mnist dataset ```bash -python train.py --config_path ../../default_config.yaml --train_data_path data/MNIST/train --test_data_path data/MNIST/test --ckpt_path checkpoint_path > log 2>&1 & -# 或进入脚本目录,执行脚本 +python train.py --config_path ../../default_config.yaml --train_data_path data/MNIST/train --test_data_path data/MNIST/test --ckpt_path checkpoint_path > log 2>&1 & +# Or enter the script directory and execute the script bash run_train_ascend.sh permuted_mnist ../data/MNIST/train ../data/MNIST/test ../checkpoint_path -``` +```` -训练结果 +training results ```bash -============== Starting Training ============== +============= Starting Training ============= epoch: 1 step: 937, loss is 0.41428128 epoch time: 59074.480 ms, per step time: 63.046 ms {'Accuracy': 0.9264823717948718} @@ -300,20 +311,20 @@ epoch time: 27978.896 ms, per step time: 29.860 ms epoch: 30 step: 937, loss is 0.3095672 epoch time: 27985.821 ms, per step time: 29.867 ms {'Accuracy': 0.9745592948717948} -``` +```` -- Ascend处理器环境运行adding_problem数据集 +- Ascend processor environment to run adding_problem dataset ```bash -python train.py --config_path ../../config_addingproblem.yaml --train_data_path data/AddProb/train --test_data_path data/AddProb/test --ckpt_path checkpoint_path > log 2>&1 & -# 或进入脚本目录,执行脚本 +python train.py --config_path ../../config_addingproblem.yaml --train_data_path data/AddProb/train --test_data_path data/AddProb/test --ckpt_path checkpoint_path > log 2>&1 & +# Or enter the script directory and execute the script bash run_train_ascend.sh adding_problem ../data/AddProb/train ../data/AddProb/test ../checkpoint_path -``` +```` -训练结果 +training results ```bash -============== Starting Training ============== +============= Starting Training ============= epoch: 1 step: 1563, loss is 0.0007961707 epoch time: 60970.954 ms, per step time: 39.009 ms {'Accuracy': Tensor(shape=[], dtype=Float32, value= 0.00389967)} @@ -327,101 +338,101 @@ epoch time: 26905.874 ms, per step time: 17.214 ms epoch: 25 step: 1563, loss is 2.3555269e-05 epoch time: 26909.343 ms, per step time: 17.216 ms {'Accuracy': Tensor(shape=[], dtype=Float32, value= 1.7229e-05)} -``` +```` -### 评估过程 +### Evaluation Process -#### 评估 +#### Evaluate -在运行以下命令之前,请检查用于评估的检查点路径。 +Before running the following command, check the checkpoint path used for evaluation. -- Ascend处理器环境运行permuted_mnist数据集 +- Ascend processor environment to run permuted_mnist dataset ```bash - python eval.py --config_path ../../default_config.yaml --test_data_path ../data/MNIST/test --ckpt_file checkpoint_path/checkpoint_tcn-30_937.ckpt > eval.log 2>&1 & - #或进入脚本目录,执行脚本 + python eval.py --config_path ../../default_config.yaml --test_data_path ../data/MNIST/test --ckpt_file checkpoint_path/checkpoint_tcn-30_937.ckpt > eval.log 2>&1 & + #Or enter the script directory and execute the script bash run_eval_ascend.sh permuted_mnist ../data/MNIST/test ../checkpoint_path/checkpoint_tcn-30_937.ckpt - ``` + ```` - 可通过"eval.log”文件查看结果。 + The results can be viewed through the "eval.log" file. - ```text - ============== Starting Testing ============== - ============== {'Accuracy': 0.9746594551282052} ============== - ``` + ````text + ============= Starting Testing ============= + ============= {'Accuracy': 0.9746594551282052} ============== + ```` -- Ascend处理器环境运行adding_problem数据集 +- Ascend processor environment to run adding_problem dataset ```bash - python eval.py --config_path ../../config_addingproblem.yaml --test_data_path /home/data/AddProb/test --ckpt_file checkpoint_add/checkpoint_tcn-25_1563.ckpt > eval.log 2>&1 & - #或进入脚本目录,执行脚本 + python eval.py --config_path ../../config_addingproblem.yaml --test_data_path /home/data/AddProb/test --ckpt_file checkpoint_add/checkpoint_tcn-25_1563.ckpt > eval.log 2>&1 & + #Or enter the script directory and execute the script bash run_eval_ascend.sh adding_problem ../data/AddProb/test ../checkpoint_path/checkpoint_tcn-25_1563.ckpt - ``` + ```` - 可通过"eval.log”文件查看结果。 + The results can be viewed through the "eval.log" file. - ```text - ============== Starting Testing ============== - ============== {'Accuracy': 1.7229e-05} ============== - ``` + ````text + ============= Starting Testing ============= + ============= {'Accuracy': 1.7229e-05} ============== + ```` -### 在Ascend310执行推理 +### Perform inference on Ascend310 -在执行推理前,mindir文件必须通过`export.py`脚本导出。以下展示了使用minir模型执行推理的示例。 +Before performing inference, the mindir file must be exported via the `export.py` script. The following shows an example of performing inference using the minir model. -### 导出MindIR +### Export MindIR ```shell -python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_FILE] -``` +python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_FILE] +```` -### 在Ascend310执行推理 +### Perform inference on Ascend310 ```shell # Ascend310 inference bash run_infer_310.sh [MINDIR_PATH] [DATASET PATH] [NEED_PREPROCESS] [DEVICE_ID] -``` - -## 模型描述 - -### 性能 - -在Permuted MNIST数据集上训练TCN模型: -| 参数 | TCN(permuted_mnist) | TCN(permuted_mnist) | -| ------------- | :-------------------------------------------------------- | --------------------------------------------------------- | -| 模型版本 | TCN | TCN | -| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755GB;系统 Euler2.8 |GPU NV SMX2 V100-32G | -| 上传日期 | 2021-11-26 | 2021-11-26 | -| MindSpore版本 | 1.3.0 | 1.8.0(Pytorch) | -| 数据集 | permuted_mnist | permuted_mnist | -| 训练参数 | epoch=30, steps=973, batch_size = 64, lr=0.003 | epoch=30, steps=973, batch_size = 64, lr=0.003 | -| 优化器 | Adam(weight_decay=1e-4) | Adam(weight_decay=1e-4) | -| 损失函数 | NLLLoss | NLLLoss | -| 输出 | 类别概率 | 类别概率 | -| 精度 | 0.9745 | 0.972 | -| 速度 | 1卡:20.3 毫秒/步 |1卡:21.5 毫秒/步 | -| 调优检查点 | 895KB(.ckpt 文件) | 297KB(.pkl文件) | - -在Adding Problem数据集上训练TCN模型: -| 参数 | TCN(adding_problem) | TCN(adding_problem) | -| ------------- | :-------------------------------------------------------- | --------------------------------------------------------- | -| 模型版本 | TCN | TCN | -| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755GB;系统 Euler2.8 |GPU NV SMX2 V100-32G | -| 上传日期 | 2021-11-26 | 2021-11-26 | -| MindSpore版本 | 1.3.0 | 1.8.0(Pytorch) | -| 数据集 | adding_problem | adding_problem | -| 训练参数 | epoch=25, steps=1563, batch_size = 32, lr=0.004 | epoch=25, steps=1563, batch_size = 32, lr=0.004 | -| 优化器 | Adam | Adam | -| 损失函数 | MSELos | MSELos | -| 输出 | 概率 | 概率 | -| 精度 | 1.7229e-05(loss) | 5.8e-05(loss) | -| 速度 | 1卡:11.6 毫秒/步 | 1卡:17.1毫秒/步 | -| 调优检查点 | 978KB(.ckpt 文件) |471KB(.pkl文件) | - -## 随机情况说明 - -在dataset.py和create_datasetAP.py中设置了随机种子 - -## ModelZoo主页 - -请浏览官网[主页](https://gitee.com/mindspore/models)。 +```` + +## model description + +### Performance + +Train a TCN model on the Permuted MNIST dataset: +| Parameters | TCN(permuted_mnist) | TCN(permuted_mnist) | +| ------------- | :------------------------------------------------ ---------------------- | --------------------------------------- ------------------------------ | +| Model Version | TCN | TCN | +| Resources | Ascend 910; CPU 2.60GHz, 192 cores; RAM 755GB; System Euler2.8 |GPU NV SMX2 V100-32G | +| Upload Date | 2021-11-26 | 2021-11-26 | +| MindSpore Version | 1.3.0 | 1.8.0(Pytorch) | +| dataset | permuted_mnist | permuted_mnist | +| training parameters | epoch=30, steps=973, batch_size = 64, lr=0.003 | epoch=30, steps=973, batch_size = 64, lr=0.003 | +| Optimizer | Adam(weight_decay=1e-4) | Adam(weight_decay=1e-4) | +| Loss Function | NLLLoss | NLLLoss | +| output | class probability | class probability | +| Accuracy | 0.9745 | 0.972 | +| Speed ​​| 1 card: 20.3 ms/step | 1 card: 21.5 ms/step | +| Tuning Checkpoints | 895KB (.ckpt file) | 297KB (.pkl file) | + +Train the TCN model on the Adding Problem dataset: +| Parameters | TCN(adding_problem) | TCN(adding_problem) | +| ------------- | :------------------------------------------------ ---------------------- | --------------------------------------- ------------------------------ | +| Model Version | TCN | TCN | +| Resources | Ascend 910; CPU 2.60GHz, 192 cores; RAM 755GB; System Euler2.8 |GPU NV SMX2 V100-32G | +| Upload Date | 2021-11-26 | 2021-11-26 | +| MindSpore Version | 1.3.0 | 1.8.0(Pytorch) | +| dataset | adding_problem | adding_problem | +| training parameters | epoch=25, steps=1563, batch_size = 32, lr=0.004 | epoch=25, steps=1563, batch_size = 32, lr=0.004 | +| Optimizer | Adam | Adam | +| Loss Function | MSELos | MSELos | +| output | probability | probability | +| Accuracy | 1.7229e-05(loss) | 5.8e-05(loss) | +| Speed ​​| 1 card: 11.6 ms/step | 1 card: 17.1 ms/step | +| Tuning Checkpoints | 978KB (.ckpt file) | 471KB (.pkl file) | + +## random situation description + +Random seeds are set in dataset.py and create_datasetAP.py + +##ModelZoo homepage + +Please visit the official website [homepage](https://gitee.com/mindspore/models). \ No newline at end of file diff --git a/research/cv/TCN/README_CN.md b/research/cv/TCN/README_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..897529c57e4c42ced170cde7641d539b592cb68f --- /dev/null +++ b/research/cv/TCN/README_CN.md @@ -0,0 +1,435 @@ +# 目录 + + + +- [目录](#目录) + - [TCN描述](#tcn描述) + - [模型架构](#模型架构) + - [数据集](#数据集) + - [环境要求](#环境要求) + - [快速入门](#快速入门) + - [脚本说明](#脚本说明) + - [脚本及样例代码](#脚本及样例代码) + - [脚本参数](#脚本参数) + - [训练过程](#训练过程) + - [训练](#训练) + - [评估过程](#评估过程) + - [评估](#评估) + - [推理过程](#推理过程) + - [导出MindIR](#导出mindir) + - [在Ascend310执行推理](#在ascend310执行推理) + - [模型描述](#模型描述) + - [性能](#性能) + - [随机情况说明](#随机情况说明) + - [ModelZoo主页](#modelzoo主页) + + + +## TCN描述 + +TCN是一种特殊的卷积神经网络——时序卷积网络(Temporal convolutional network, TCN),于2018年被提出。相较于经典的时序模型RNN结构,TCN模型拥有较高的并行性、更加灵活的感受野,稳定的梯度和更小的内存消耗等优点,在多个时序问题上表现优异。 + +[论文](https://arxiv.org/pdf/1803.01271.pdf)An Empirical Evaluation of Generic Convolutional and Recurrent Networks +for Sequence Modeling + +## 模型架构 + +## 数据集 + +数据集:[Permuted MNIST]() + +- 数据集大小:52.4M,共10个类,6万张 28*28图像 + - 训练集:6万张图像 + - 测试集:5万张图像 +- 数据格式:二进制文件 + - 注:数据在dataset.py中处理。 + +- 目录结构如下: + +```bash +└─data + └─MNIST + ├─test + │ ├─t10k-images.idx3-ubyte + │ └─t10k-labels.idx1-ubyte + └─train + ├─train-images.idx3-ubyte + └─train-labels.idx1-ubyte +``` + +数据集:Adding Problem + +- 数据集描述: + + 在该任务中,每个输入由深度为2的长度T序列组成,所有值在维度1的[0,1]中随机选择。第二个维度由除两个元素外的所有零组成,这两个元素用1标记。目标是将第二维度标记为1的两个随机值相加。我们可以把它看作是计算二维的点积。 + 简单预测总和为1时,MSE应为0.1767左右。 + + 注:因为TCN的感受野取决于网络的深度和滤波器的大小,我们需要确保我们使用的模型能够覆盖序列长度T。 + +- 数据处理: + + 可以使用create_datasetAP.py文件用于生成训练集和测试集,并以bin文件格式保存在`../data/AddProb`目录下。 + + 文件datasetAP.py用于读取已经生成的测试集和训练集。 + +## 环境要求 + +- 硬件(Ascend/GPU) + - 准备Ascend或GPU处理器搭建硬件环境。 +- 框架 + - [MindSpore](https://www.mindspore.cn/install) +- 如需查看详情,请参见如下资源: + - [MindSpore教程](https://www.mindspore.cn/tutorials/zh-CN/master/index.html) + - [MindSpore Python API](https://www.mindspore.cn/docs/api/zh-CN/master/index.html) + +## 快速入门 + +通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估: + +```bash +# 进入脚本目录,训练TCN +bash run_train_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [TEST_PATH] [CKPT_PATH] +# example: bash run_train_ascend.sh permuted_mnist ../data/MNIST/train ../data/MNIST/test ../checkpoint_path + +# 进入脚本目录,用标准参数训练TCN +bash run_train_standalone_gpu.sh [permuted_mnist | adding_problem] [DATA_PATH] [TEST_PATH] [CKPT_PATH] + +# 进入脚本目录,评估TCN +bash run_eval_ascend.sh [permuted_mnist | adding_problem] [DATA_PATH] [CKPT_FILE] +# example: bash run_eval_ascend.sh permuted_mnist ../data/MNIST/test ../checkpoint_tcn-30_937.ckpt + +# 进入脚本目录,评估TCN (GPU) +bash run_eval_gpu.sh [permuted_mnist | adding_problem] [DATA_PATH] [CKPT_FILE] +``` + +- 在 ModelArts 进行训练 (如果你想在modelarts上运行,可以参考以下文档 [modelarts](https://support.huaweicloud.com/modelarts/)) + + ```bash + # 在 ModelArts 上使用单卡训练permuted_mnist数据集 + # (1) 执行a或者b + # a. 在 default_config.yaml 文件中设置 "enable_modelarts=True" + # 在 default_config.yaml 文件中设置 "data_path='/cache/data'" + # 在 default_config.yaml 文件中设置 "train_data_path='/cache/data/MNIST/train'" + # 在 default_config.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" + # 在 default_config.yaml 文件中设置 "ckpt_path='/cache/train'" + # (可选)在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在 default_config.yaml 文件中设置 其他参数 + # b. 在网页上设置 "enable_modelarts=True" + # 在网页上设置 "train_data_path='/cache/data/MNIST/train'" + # 在网页上设置 "test_data_path='/cache/data/MNIST/test'" + # 在网页上设置 "data_path='/cache/data'" + # 在网页上设置 "ckpt_path='/cache/train'" + # (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在网页上设置 其他参数 + # (2) 准备模型代码 + # (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上 + # (4) 上传原始 MNIST 数据集到 S3 桶上 + # (5) 在网页上设置你的代码路径为 "/path/tcn" + # (6) 在网页上设置启动文件为 "train.py" + # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 + # (8) 创建训练作业 + # + # 在 ModelArts 上使用单卡验证permuted_mnist数据集 + # (1) 执行a或者b + # a. 在 default_config.yaml 文件中设置 "enable_modelarts=True" + # 在 default_config.yaml 文件中设置 "data_path='/cache/data'" + # 在 default_config.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" + # 在 default_config.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在 default_config.yaml 文件中设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" + # 在 default_config.yaml 文件中设置 其他参数 + # b. 在网页上设置 "enable_modelarts=True" + # 在网页上设置 "data_path='/cache/data'" + # 在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在网页上设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt'" + # 在网页上设置 其他参数 + # (2) 准备模型代码 + # (3) 上传你训练好的模型到 S3 桶上 + # (4) 上传原始 MNIST 数据集到 S3 桶上 + # (5) 在网页上设置你的代码路径为 "/path/tcn" + # (6) 在网页上设置启动文件为 "eval.py" + # (7) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 + # (8) 创建训练作业 + ``` + + ```bash + # 在 ModelArts 上使用单卡训练adding_problem数据集 + # (1) 执行a或者b + # a. 在 config_addingproblem.yaml 文件中设置 "enable_modelarts=True" + # 在 config_addingproblem.yaml 文件中设置 "data_path='/cache/data'" + # 在 config_addingproblem.yaml 文件中设置 "train_data_path='/cache/data/AddProb/train'" + # 在 config_addingproblem.yaml 文件中设置 "test_data_path='/cache/data/AddProb/test'" + # 在 config_addingproblem.yaml 文件中设置 "ckpt_path='/cache/train'" + # (可选)在 config_addingproblem.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在 config_addingproblem.yaml 文件中设置 其他参数 + # b. 在网页上设置 "enable_modelarts=True" + # 在网页上设置 "train_data_path='/cache/data/AddProb/train'" + # 在网页上设置 "test_data_path='/cache/data/AddProb/test'" + # 在网页上设置 "data_path='/cache/data'" + # 在网页上设置 "ckpt_path='/cache/train'" + # (可选)在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在网页上设置 其他参数 + # (2) 准备模型代码 + # (3) 如果选择微调您的模型,上传你的预训练模型到 S3 桶上 + # (4) 生成原始 AddingProblem 数据集到 S3 桶上 + # (5) 在网页上设置你的代码路径为 "/path/tcn" + # (6) 在网页上设置启动文件为 "train.py" + # (7) 在网页上设置config_path为"../../config_addingproblem.yaml" + # (8) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 + # (9) 创建训练作业 + # + # 在 ModelArts 上使用单卡验证adding_problem数据集 + # (1) 执行a或者b + # a. 在 config_addingproblem.yaml 文件中设置 "enable_modelarts=True" + # 在 config_addingproblem.yaml 文件中设置 "data_path='/cache/data'" + # 在 config_addingproblem.yaml 文件中设置 "test_data_path='/cache/data/MNIST/test'" + # 在 config_addingproblem.yaml 文件中设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在 config_addingproblem.yaml 文件中设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" + # 在 config_addingproblem.yaml 文件中设置 其他参数 + # b. 在网页上设置 "enable_modelarts=True" + # 在网页上设置 "data_path='/cache/data'" + # 在网页上设置 "checkpoint_url='s3://dir_to_your_pretrained/'" + # 在网页上设置 "ckpt_file='/cache/checkpoint_path/checkpoint_tcn-25_1563.ckpt'" + # 在网页上设置 其他参数 + # (2) 准备模型代码 + # (3) 上传你训练好的模型到 S3 桶上 + # (4) 生成原始 AddingProblem 数据集到 S3 桶上 + # (5) 在网页上设置你的代码路径为 "/path/TCN" + # (6) 在网页上设置启动文件为 "eval.py" + # (7) 在网页上设置config_path为"../../config_addingproblem.yaml" + # (8) 在网页上设置"训练数据集"、"训练输出文件路径"、"作业日志路径"等 + # (9) 创建训练作业 + ``` + +## 脚本说明 + +### 脚本及样例代码 + +```bash + ├── TCN + ├── README.md // TCN相关说明 + ├── ascend310 // 实现310推理源代码 + ├── scripts + │ ├──run_train_ascend.sh // 在Ascend中训练的脚本 + │ ├──run_train_gpu.sh // 在GPU中训练的脚本 + │ ├──run_train_standalone.sh // 使用默认设置进行 GPU 训练 + │ ├──run_eval_ascend.sh // 在Ascend中评估的脚本 + │ ├──run_eval_gpu.sh // 在GPU中评估的脚本 + │ ├──run_eval_standalone.sh // 脚本在 GPU 中使用标准参数进行评估 + │ ├──run_infer_310.sh // 在310中进行离线推理的脚本 + ├── src + │ ├──dataset.py // 创建MNIST数据集 + │ ├──create_datasetAP.py // 生成AddingProblem数据集 + │ ├──datasetAP.py // 读取AddingProblem数据集 + │ ├──TCN.py // TCN主要架构 + │ ├──model.py // 为了适应MNIST数据的模型 + │ ├──metric.py // 自定义模型评价标准 + │ ├──weight_norm.py // 权重归一化 + │ ├──loss.py // 损失 + │ ├──lr_generator.py // 动态学习率 + │ └──model_utils + │ ├──config.py // 训练配置 + │ ├──device_adapter.py // 获取云上id + │ ├──local_adapter.py // 获取本地id + │ └──moxing_adapter.py // 参数处理 + ├── default_config.yaml // MNIST数据集训练参数配置文件 + ├── config_addingproblem.yaml // AddingProblem数据集训练参数配置文件 + ├── train.py // 训练脚本 + ├── eval.py // 评估脚本 + ├── export.py // 导出脚本 + ├── postprocess.py // 310推理后处理脚本 + ├── preprocess.py // 310推理前处理脚本 + ├── requirements.txt // 所需要的python库 +``` + +### 脚本参数 + +```bash +train.py和config.py中主要参数如下: +--enable_modelarts:允许云上适配。 +--train_data_path:到训练的路径。 +--test_data_path:到评估的路径。 +--output_path:保存checkpoint文件的路径。 +--load_path:加载checkpoint文件。 +--checkpoint_path:训练后保存的检查点文件的绝对完整路径。 +--data_path:数据集所在路径。 +--device_target:实现代码的设备。 +--epoch_size:总训练轮次。 +--epoch_change:学习率发生变化的轮次。 +--batch_size:训练批次大小。 +--image_height:图像高度作为模型输入(仅在permuted mnist数据集)。 +--image_width:图像宽度作为模型输入(仅在permuted mnist数据集)。 +--dataset_name:数据集名称。可选值为"permuted_mnist"和"adding_problem"。 +--channel_size:输入通道数。 +--num_classes:类别个数。 +--lr:学习率。 +--batch_train:运行训练集的batch size。 +--batch_test:运行测试集的batch size。 +--dropout:dropout大小。 +--kernel_size:卷积核大小。 +--level:TCN的层数。 +--nhid:每一层的节点数目。 +--save_checkpoint_steps:间隔多少个step保存checkpoint文件。 +--keep_checkpoint_max:最多保存checkpoint文件的数目。 +--N_train:adding problem数据集训练集大小(仅在adding problem数据集)。 +--N_test:adding problem数据集测试集大小(仅在adding problem数据集) 。 +--seq_length:adding problem数据的序列长度(仅在adding problem数据集)。 +--device_id:运行设备id。 +--file_name:保存MINDIR文件的名称。 +--file_format:默认为"MINDIR”。 + +``` + +### 训练过程 + +#### 训练 + +- Ascend处理器环境运行permuted_mnist数据集 + +```bash +python train.py --config_path ../../default_config.yaml --train_data_path data/MNIST/train --test_data_path data/MNIST/test --ckpt_path checkpoint_path > log 2>&1 & +# 或进入脚本目录,执行脚本 +bash run_train_ascend.sh permuted_mnist ../data/MNIST/train ../data/MNIST/test ../checkpoint_path +``` + +训练结果 + +```bash +============== Starting Training ============== +epoch: 1 step: 937, loss is 0.41428128 +epoch time: 59074.480 ms, per step time: 63.046 ms +{'Accuracy': 0.9264823717948718} +epoch: 2 step: 937, loss is 0.22595052 +epoch time: 27975.444 ms, per step time: 29.856 ms +{'Accuracy': 0.9477163461538461} +... +epoch: 29 step: 937, loss is 0.015848802 +epoch time: 27978.896 ms, per step time: 29.860 ms +{'Accuracy': 0.9740584935897436} +epoch: 30 step: 937, loss is 0.3095672 +epoch time: 27985.821 ms, per step time: 29.867 ms +{'Accuracy': 0.9745592948717948} +``` + +- Ascend处理器环境运行adding_problem数据集 + +```bash +python train.py --config_path ../../config_addingproblem.yaml --train_data_path data/AddProb/train --test_data_path data/AddProb/test --ckpt_path checkpoint_path > log 2>&1 & +# 或进入脚本目录,执行脚本 +bash run_train_ascend.sh adding_problem ../data/AddProb/train ../data/AddProb/test ../checkpoint_path +``` + +训练结果 + +```bash +============== Starting Training ============== +epoch: 1 step: 1563, loss is 0.0007961707 +epoch time: 60970.954 ms, per step time: 39.009 ms +{'Accuracy': Tensor(shape=[], dtype=Float32, value= 0.00389967)} +epoch: 2 step: 1563, loss is 0.0012416712 +epoch time: 27022.374 ms, per step time: 17.289 ms +{'Accuracy': Tensor(shape=[], dtype=Float32, value= 0.00148527)} +... +epoch: 24 step: 1563, loss is 1.0663439e-05 +epoch time: 26905.874 ms, per step time: 17.214 ms +{'Accuracy': Tensor(shape=[], dtype=Float32, value= 3.18133e-05)} +epoch: 25 step: 1563, loss is 2.3555269e-05 +epoch time: 26909.343 ms, per step time: 17.216 ms +{'Accuracy': Tensor(shape=[], dtype=Float32, value= 1.7229e-05)} +``` + +### 评估过程 + +#### 评估 + +在运行以下命令之前,请检查用于评估的检查点路径。 + +- Ascend处理器环境运行permuted_mnist数据集 + + ```bash + python eval.py --config_path ../../default_config.yaml --test_data_path ../data/MNIST/test --ckpt_file checkpoint_path/checkpoint_tcn-30_937.ckpt > eval.log 2>&1 & + #或进入脚本目录,执行脚本 + bash run_eval_ascend.sh permuted_mnist ../data/MNIST/test ../checkpoint_path/checkpoint_tcn-30_937.ckpt + ``` + + 可通过"eval.log”文件查看结果。 + + ```text + ============== Starting Testing ============== + ============== {'Accuracy': 0.9746594551282052} ============== + ``` + +- Ascend处理器环境运行adding_problem数据集 + + ```bash + python eval.py --config_path ../../config_addingproblem.yaml --test_data_path /home/data/AddProb/test --ckpt_file checkpoint_add/checkpoint_tcn-25_1563.ckpt > eval.log 2>&1 & + #或进入脚本目录,执行脚本 + bash run_eval_ascend.sh adding_problem ../data/AddProb/test ../checkpoint_path/checkpoint_tcn-25_1563.ckpt + ``` + + 可通过"eval.log”文件查看结果。 + + ```text + ============== Starting Testing ============== + ============== {'Accuracy': 1.7229e-05} ============== + ``` + +### 在Ascend310执行推理 + +在执行推理前,mindir文件必须通过`export.py`脚本导出。以下展示了使用minir模型执行推理的示例。 + +### 导出MindIR + +```shell +python export.py --config_path [CONFIG_PATH] --ckpt_file [CKPT_FILE] +``` + +### 在Ascend310执行推理 + +```shell +# Ascend310 inference +bash run_infer_310.sh [MINDIR_PATH] [DATASET PATH] [NEED_PREPROCESS] [DEVICE_ID] +``` + +## 模型描述 + +### 性能 + +在Permuted MNIST数据集上训练TCN模型: +| 参数 | TCN(permuted_mnist) | TCN(permuted_mnist) | +| ------------- | :-------------------------------------------------------- | --------------------------------------------------------- | +| 模型版本 | TCN | TCN | +| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755GB;系统 Euler2.8 |GPU NV SMX2 V100-32G | +| 上传日期 | 2021-11-26 | 2021-11-26 | +| MindSpore版本 | 1.3.0 | 1.8.0(Pytorch) | +| 数据集 | permuted_mnist | permuted_mnist | +| 训练参数 | epoch=30, steps=973, batch_size = 64, lr=0.003 | epoch=30, steps=973, batch_size = 64, lr=0.003 | +| 优化器 | Adam(weight_decay=1e-4) | Adam(weight_decay=1e-4) | +| 损失函数 | NLLLoss | NLLLoss | +| 输出 | 类别概率 | 类别概率 | +| 精度 | 0.9745 | 0.972 | +| 速度 | 1卡:20.3 毫秒/步 |1卡:21.5 毫秒/步 | +| 调优检查点 | 895KB(.ckpt 文件) | 297KB(.pkl文件) | + +在Adding Problem数据集上训练TCN模型: +| 参数 | TCN(adding_problem) | TCN(adding_problem) | +| ------------- | :-------------------------------------------------------- | --------------------------------------------------------- | +| 模型版本 | TCN | TCN | +| 资源 | Ascend 910;CPU 2.60GHz,192核;内存 755GB;系统 Euler2.8 |GPU NV SMX2 V100-32G | +| 上传日期 | 2021-11-26 | 2021-11-26 | +| MindSpore版本 | 1.3.0 | 1.8.0(Pytorch) | +| 数据集 | adding_problem | adding_problem | +| 训练参数 | epoch=25, steps=1563, batch_size = 32, lr=0.004 | epoch=25, steps=1563, batch_size = 32, lr=0.004 | +| 优化器 | Adam | Adam | +| 损失函数 | MSELos | MSELos | +| 输出 | 概率 | 概率 | +| 精度 | 1.7229e-05(loss) | 5.8e-05(loss) | +| 速度 | 1卡:11.6 毫秒/步 | 1卡:17.1毫秒/步 | +| 调优检查点 | 978KB(.ckpt 文件) |471KB(.pkl文件) | + +## 随机情况说明 + +在dataset.py和create_datasetAP.py中设置了随机种子 + +## ModelZoo主页 + +请浏览官网[主页](https://gitee.com/mindspore/models)。 \ No newline at end of file diff --git a/research/cv/TCN/config_addingproblem.yaml b/research/cv/TCN/config_addingproblem.yaml index 3fc7eb50dec10a68fa28eaec0653f1f774a07840..179ec8502b6ab00a7bad6c2d3bccd64719ad6f9d 100644 --- a/research/cv/TCN/config_addingproblem.yaml +++ b/research/cv/TCN/config_addingproblem.yaml @@ -5,16 +5,16 @@ data_url: "" train_url: "" checkpoint_url: "" # Path for local -data_path: "/cache/data/AddProb" -train_data_path: "/cache/data/AddProb/train" -test_data_path: "/cache/data/AddProb/test" -output_path: "/cache/train" -load_path: "/cache/checkpoint_path" -device_target: Ascend +data_path: "./data/MNIST" +train_data_path: "./data/MNIST/train" +test_data_path: "./data/MNIST/test" +output_path: "./output" +load_path: "./checkpoints" +device_target: GPU enable_profiling: False -ckpt_path: './checkpoint_path/' -ckpt_file: '/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt' +ckpt_path: "./checkpoints" +ckpt_file: '' # ============================================================================== # Training options channel_size: 2 @@ -64,6 +64,6 @@ file_name: 'output file name.' file_format: 'file format' result_path: "result files path." --- -device_target: 'Ascend' +device_target: 'GPU' file_format: 'MINDIR' diff --git a/research/cv/TCN/default_config.yaml b/research/cv/TCN/default_config.yaml index 8520c52449c5208c8cb2e764fff8ecffda182c73..ba037943724fb4ad32197f6454697c4eee75f514 100644 --- a/research/cv/TCN/default_config.yaml +++ b/research/cv/TCN/default_config.yaml @@ -5,24 +5,24 @@ data_url: "" train_url: "" checkpoint_url: "" # Path for local -data_path: "/cache/data/MNIST" -train_data_path: "/cache/data/MNIST/train" -test_data_path: "/cache/data/MNIST/test" -output_path: "/cache/train" -load_path: "/cache/checkpoint_path" -device_target: Ascend +data_path: "./data/MNIST" +train_data_path: "./data/MNIST/train" +test_data_path: "./data/MNIST/test" +output_path: "./output" +load_path: "./checkpoints" +device_target: GPU enable_profiling: False -ckpt_path: './checkpoint_path/' -ckpt_file: '/cache/checkpoint_path/checkpoint_tcn-30_937.ckpt' +ckpt_path: './checkpoints' +ckpt_file: '' # ============================================================================== # Training options channel_size: 1 num_classes: 10 -lr: 0.003 +lr: 0.001 epoch_size: 30 epoch_change: 10 -batch_size: 64 +batch_size: 128 buffer_size: 1000 weight_decay: 0.0001 image_height: 28 @@ -65,6 +65,6 @@ file_format: 'file format' result_path: "result files path." img_path: "image file path." --- -device_target: 'Ascend' +device_target: 'GPU' file_format: 'MINDIR' diff --git a/research/cv/TCN/infer/convert/air2om.sh b/research/cv/TCN/infer/convert/air2om.sh new file mode 100644 index 0000000000000000000000000000000000000000..097818dedc8abf107c8cbcd85448c5beaadec968 --- /dev/null +++ b/research/cv/TCN/infer/convert/air2om.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +model_path=$1 +output_model_name=$2 + +atc --model=$model_path \ + --framework=1 \ + --output=$output_model_name \ + --input_format=NCHW \ + --soc_version=Ascend310 \ + --output_type=FP32 \ No newline at end of file diff --git a/research/cv/TCN/infer/data/config/TCN.cfg b/research/cv/TCN/infer/data/config/TCN.cfg new file mode 100644 index 0000000000000000000000000000000000000000..288ff347f141617e8bc7d0eb3330548d52d97620 --- /dev/null +++ b/research/cv/TCN/infer/data/config/TCN.cfg @@ -0,0 +1,3 @@ +CLASS_NUM=10 +SOFTMAX=false +TOP_K=1 diff --git a/research/cv/TCN/infer/data/config/TCN.pipeline b/research/cv/TCN/infer/data/config/TCN.pipeline new file mode 100644 index 0000000000000000000000000000000000000000..7502c986f1626ce74599a2e9cd6e8548db90c14e --- /dev/null +++ b/research/cv/TCN/infer/data/config/TCN.pipeline @@ -0,0 +1,45 @@ +{ + "TCN": { + "stream_config": { + "deviceId": "0" + }, + "appsrc0": { + "props": { + "blocksize": "409600" + }, + "factory": "appsrc", + "next": "mxpi_tensorinfer0" + }, + "mxpi_tensorinfer0": { + "props": { + "dataSource": "appsrc0", + "modelPath": "../data/model/TCN.om" + }, + "factory": "mxpi_tensorinfer", + "next": "mxpi_classpostprocessor0" + }, + "mxpi_classpostprocessor0": { + "props": { + "dataSource": "mxpi_tensorinfer0", + "postProcessConfigPath": "../data/config/TCN.cfg", + "labelPath": "../data/config/minst_labels.names", + "postProcessLibPath": "libresnet50postprocess.so" + }, + "factory": "mxpi_classpostprocessor", + "next": "mxpi_dataserialize0" + }, + "mxpi_dataserialize0": { + "props": { + "outputDataKeys": "mxpi_classpostprocessor0" + }, + "factory": "mxpi_dataserialize", + "next": "appsink0" + }, + "appsink0": { + "props": { + "blocksize": "4096000" + }, + "factory": "appsink" + } + } +} diff --git a/research/cv/TCN/infer/data/config/minst_labels.names b/research/cv/TCN/infer/data/config/minst_labels.names new file mode 100644 index 0000000000000000000000000000000000000000..f55b5c9eef39f00a9a4d8a58764fdc365b60f081 --- /dev/null +++ b/research/cv/TCN/infer/data/config/minst_labels.names @@ -0,0 +1,10 @@ +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 \ No newline at end of file diff --git a/research/cv/TCN/infer/data/preprocess/export_bin_file.py b/research/cv/TCN/infer/data/preprocess/export_bin_file.py new file mode 100644 index 0000000000000000000000000000000000000000..6ed1c435845a2e44555ec68e43fa9238f798732a --- /dev/null +++ b/research/cv/TCN/infer/data/preprocess/export_bin_file.py @@ -0,0 +1,62 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +export mnist dataset to bin. +""" +import os +import argparse +from src.dataset import create_dataset + +def parse_args(): + parser = argparse.ArgumentParser(description='MNIST to bin') + parser.add_argument('--device_target', type=str, default="Ascend", + choices=['Ascend', 'GPU'], + help='device where the code will be implemented (default: Ascend)') + parser.add_argument('--dataset_dir', type=str, default='', help='dataset path') + parser.add_argument('--save_dir', type=str, default='', help='path to save bin file') + parser.add_argument('--batch_size', type=int, default=1, help='batch size for bin') + args_, _ = parser.parse_known_args() + return args_ + +if __name__ == "__main__": + args = parse_args() + os.environ["RANK_SIZE"] = '1' + os.environ["RANK_ID"] = '0' + device_id = int(os.getenv('DEVICE_ID')) if os.getenv('DEVICE_ID') else 0 + # context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, device_id=device_id) + mnist_path = os.path.join(args.dataset_dir, 'test') + batch_size = args.batch_size + save_dir = os.path.join(args.save_dir, 'mnist_infer_data') + folder = os.path.join(save_dir, 'mnist_bs_' + str(batch_size) + '_bin') + if not os.path.exists(folder): + os.makedirs(folder) + ds = create_dataset(mnist_path, batch_size) + iter_num = 0 + label_file = os.path.join(save_dir, './mnist_bs_' + str(batch_size) + '_label.txt') + with open(label_file, 'w') as f: + for data in ds.create_dict_iterator(): + image = data['image'] + label = data['label'] + file_name = "mnist_" + str(iter_num) + ".bin" + file_path = folder + "/" + file_name + image.asnumpy().tofile(file_path) + f.write(file_name) + for i in label: + f.write(',' + str(i)) + f.write('\n') + iter_num += 1 + print("=====iter_num:{}=====".format(iter_num)) + print("=====image_data:{}=====".format(image)) + print("=====label_data:{}=====".format(label)) diff --git a/research/cv/TCN/infer/docker_start_infer.sh b/research/cv/TCN/infer/docker_start_infer.sh new file mode 100644 index 0000000000000000000000000000000000000000..2678ff3f94b2b0be1bb20af554f3787f58b70aef --- /dev/null +++ b/research/cv/TCN/infer/docker_start_infer.sh @@ -0,0 +1,49 @@ +#!/usr/bin/env bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +docker_image=$1 +model_dir=$2 + + +function show_help() { + echo "Usage: docker_start.sh docker_image model_dir data_dir" +} + +function param_check() { + if [ -z "${docker_image}" ]; then + echo "please input docker_image" + show_help + exit 1 + fi + + if [ -z "${model_dir}" ]; then + echo "please input model_dir" + show_help + exit 1 + fi +} + +param_check + +docker run -it -u root \ + --device=/dev/davinci0 \ + --device=/dev/davinci_manager \ + --device=/dev/devmm_svm \ + --device=/dev/hisi_hdc \ + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ + -v ${model_dir}:${model_dir} \ + ${docker_image} \ + /bin/bash diff --git a/research/cv/TCN/infer/mxbase/CMakeLists.txt b/research/cv/TCN/infer/mxbase/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..61f29d466094812bc47d4ce1a4b22502da064e4b --- /dev/null +++ b/research/cv/TCN/infer/mxbase/CMakeLists.txt @@ -0,0 +1,35 @@ +cmake_minimum_required(VERSION 3.5.2) +project(TCN) +add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0) + + +set(TARGET_MAIN TCN) + +set(ACL_LIB_PATH $ENV{ASCEND_HOME}/ascend-toolkit/latest/acllib) + +include_directories(${CMAKE_CURRENT_BINARY_DIR}) + +include_directories($ENV{MX_SDK_HOME}/include) +include_directories($ENV{MX_SDK_HOME}/opensource/include) +include_directories($ENV{MX_SDK_HOME}/opensource/include/opencv4) +include_directories($ENV{MX_SDK_HOME}/opensource/include/gstreamer-1.0) +include_directories($ENV{MX_SDK_HOME}/opensource/include/glib-2.0) +include_directories($ENV{MX_SDK_HOME}/opensource/lib/glib-2.0/include) + +link_directories($ENV{MX_SDK_HOME}/lib) +link_directories($ENV{MX_SDK_HOME}/opensource/lib/) + + +add_compile_options(-std=c++11 -fPIC -fstack-protector-all -pie -Wno-deprecated-declarations) +add_compile_options("-DPLUGIN_NAME=${PLUGIN_NAME}") +add_compile_options("-Dgoogle=mindxsdk_private") + +add_definitions(-DENABLE_DVPP_INTERFACE) + +include_directories(${ACL_LIB_PATH}/include) +link_directories(${ACL_LIB_PATH}/lib64/) + + + +add_executable(${TARGET_MAIN} src/main.cpp src/TCN.cpp) +target_link_libraries(${TARGET_MAIN} ${TARGET_LIBRARY} glog cpprest mxbase libascendcl.so) diff --git a/research/cv/TCN/infer/mxbase/build.sh b/research/cv/TCN/infer/mxbase/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..71bc7df1a4c7a83288f5423af5740a2679b0302f --- /dev/null +++ b/research/cv/TCN/infer/mxbase/build.sh @@ -0,0 +1,48 @@ +#!/bin/bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +export ASCEND_VERSION=ascend-toolkit/latest +export ARCH_PATTERN=. +export LD_LIBRARY_PATH=${MX_SDK_HOME}/lib/modelpostprocessors:${LD_LIBRARY_PATH} + +mkdir -p build +cd build || exit + +function make_plugin() { + if ! cmake ..; + then + echo "cmake failed." + return 1 + fi + + if ! (make); + then + echo "make failed." + return 1 + fi + + return 0 +} + +if make_plugin; +then + echo "INFO: Build successfully." +else + echo "ERROR: Build failed." +fi + +cd - || exit + diff --git a/research/cv/TCN/infer/mxbase/src/TCN.cpp b/research/cv/TCN/infer/mxbase/src/TCN.cpp new file mode 100644 index 0000000000000000000000000000000000000000..78b1af2698280c4bda9c0db8935334a3aa25ed60 --- /dev/null +++ b/research/cv/TCN/infer/mxbase/src/TCN.cpp @@ -0,0 +1,187 @@ +/* + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "TCN.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "acl/acl.h" +#include "MxBase/DeviceManager/DeviceManager.h" +#include "MxBase/Log/Log.h" +namespace { + const int FLOAT_SIZE = 4; + const int CLASS_NUM = 10; +} + +APP_ERROR TCN::Init(const InitParam &initParam) { + deviceId_ = initParam.deviceId; + APP_ERROR ret = MxBase::DeviceManager::GetInstance()->InitDevices(); + if (ret != APP_ERR_OK) { + LogError << "Init devices failed, ret=" << ret << "."; + return ret; + } + ret = MxBase::TensorContext::GetInstance()->SetContext(initParam.deviceId); + if (ret != APP_ERR_OK) { + LogError << "Set context failed, ret=" << ret << "."; + return ret; + } + model_TCN = std::make_shared(); + ret = model_TCN->Init(initParam.modelPath, modelDesc_); + if (ret != APP_ERR_OK) { + LogError << "ModelInferenceProcessor init failed, ret=" << ret << "."; + return ret; + } + + return APP_ERR_OK; +} + +APP_ERROR TCN::DeInit() { + model_TCN->DeInit(); + MxBase::DeviceManager::GetInstance()->DestroyDevices(); + return APP_ERR_OK; +} + + +APP_ERROR TCN::ReadBin(const std::string &path, std::vector> &dataset) { + std::ifstream inFile(path, std::ios::binary); + float data[28*28]; + inFile.read(reinterpret_cast(&data), sizeof(data)); + std::vector temp(data, data+sizeof(data) / sizeof(data[0])); + dataset.push_back(temp); + + return APP_ERR_OK; +} + + +APP_ERROR TCN::VectorToTensorBase(const std::vector> &input, + MxBase::TensorBase &tensorBase) { + uint32_t dataSize = 1*1*784; + float *metaFeatureData = new float[dataSize]; + uint32_t idx = 0; + for (size_t bs = 0; bs < input.size(); bs++) { + for (size_t c = 0; c < input[bs].size(); c++) { + metaFeatureData[idx++] = input[bs][c]; + } + } + MxBase::MemoryData memoryDataDst(dataSize * FLOAT_SIZE, MxBase::MemoryData::MEMORY_DEVICE, deviceId_); + MxBase::MemoryData memoryDataSrc(reinterpret_cast(metaFeatureData), dataSize * FLOAT_SIZE, + MxBase::MemoryData::MEMORY_HOST_MALLOC); + + APP_ERROR ret = MxBase::MemoryHelper::MxbsMallocAndCopy(memoryDataDst, memoryDataSrc); + if (ret != APP_ERR_OK) { + LogError << GetError(ret) << "Memory malloc failed."; + return ret; + } + + std::vector shape = {1, 1, 784}; + tensorBase = MxBase::TensorBase(memoryDataDst, false, shape, MxBase::TENSOR_DTYPE_FLOAT32); + + return APP_ERR_OK; +} + + +APP_ERROR TCN::Inference(const std::vector &inputs, + std::vector &outputs) { + auto dtypes = model_TCN->GetOutputDataType(); + for (size_t i = 0; i < modelDesc_.outputTensors.size(); ++i) { + std::vector shape = {}; + for (size_t j = 0; j < modelDesc_.outputTensors[i].tensorDims.size(); ++j) { + shape.push_back((uint32_t)modelDesc_.outputTensors[i].tensorDims[j]); + } + MxBase::TensorBase tensor(shape, dtypes[i], MxBase::MemoryData::MemoryType::MEMORY_DEVICE, deviceId_); + APP_ERROR ret = MxBase::TensorBase::TensorBaseMalloc(tensor); + if (ret != APP_ERR_OK) { + LogError << "TensorBaseMalloc failed, ret=" << ret << "."; + return ret; + } + outputs.push_back(tensor); + } + MxBase::DynamicInfo dynamicInfo = {}; + dynamicInfo.dynamicType = MxBase::DynamicType::STATIC_BATCH; + auto startTime = std::chrono::high_resolution_clock::now(); + APP_ERROR ret = model_TCN->ModelInference(inputs, outputs, dynamicInfo); + auto endTime = std::chrono::high_resolution_clock::now(); + double costMs = std::chrono::duration(endTime - startTime).count(); + inferCostTimeMilliSec += costMs; + if (ret != APP_ERR_OK) { + LogError << "ModelInference TCN failed, ret=" << ret << "."; + return ret; + } + return APP_ERR_OK; +} + +APP_ERROR TCN::Process(const std::string &image_path, const InitParam &initParam, std::vector &outputs) { + std::vector inputs = {}; + std::vector outputs_tb = {}; + std::string infer_result_path = "./infer_results.txt"; + + std::vector> image_data; + APP_ERROR ret = ReadBin(image_path, image_data); + if (ret != APP_ERR_OK) { + LogError << "ToTensorBase failed, ret=" << ret << "."; + return ret; + } + + MxBase::TensorBase tensorBase; + APP_ERROR ret1 = VectorToTensorBase(image_data, tensorBase); + if (ret1 != APP_ERR_OK) { + LogError << "ToTensorBase failed, ret=" << ret1 << "."; + return ret1; + } + inputs.push_back(tensorBase); + auto startTime = std::chrono::high_resolution_clock::now(); + APP_ERROR ret3 = Inference(inputs, outputs_tb); + + auto endTime = std::chrono::high_resolution_clock::now(); + double costMs = std::chrono::duration(endTime - startTime).count(); + inferCostTimeMilliSec += costMs; + if (ret3 != APP_ERR_OK) { + LogError << "Inference failed, ret=" << ret3 << "."; + return ret3; + } + + if (!outputs_tb[0].IsHost()) { + outputs_tb[0].ToHost(); + } + float *value = reinterpret_cast(outputs_tb[0].GetBuffer()); + + float tmp_max = -99999; + int idx_max = -1; + for (int i = 0; i < CLASS_NUM; i++) { + if (value[i] > tmp_max) { + idx_max = i; + tmp_max = value[i]; + } + } + std::ofstream outfile(infer_result_path, std::ios::app); + + if (outfile.fail()) { + LogError << "Failed to open result file: "; + return APP_ERR_COMM_FAILURE; + } + outfile << idx_max<< "\n"; + outfile.close(); + + outputs.push_back(idx_max); +} diff --git a/research/cv/TCN/infer/mxbase/src/TCN.h b/research/cv/TCN/infer/mxbase/src/TCN.h new file mode 100644 index 0000000000000000000000000000000000000000..8dec9d6628200ff0124a41b166b3d16f66d71565 --- /dev/null +++ b/research/cv/TCN/infer/mxbase/src/TCN.h @@ -0,0 +1,54 @@ +/* + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MXBASE_TCN_H +#define MXBASE_TCN_H +#include +#include +#include +#include "acl/acl.h" +#include "MxBase/DvppWrapper/DvppWrapper.h" +#include "MxBase/ModelInfer/ModelInferenceProcessor.h" +#include "MxBase/Tensor/TensorContext/TensorContext.h" +#include "MxBase/CV/Core/DataType.h" + +struct InitParam { + uint32_t deviceId; + bool checkTensor; + std::string modelPath; +}; + +class TCN { + public: + APP_ERROR Init(const InitParam &initParam); + APP_ERROR DeInit(); + APP_ERROR VectorToTensorBase(const std::vector> &input, MxBase::TensorBase &tensorBase); + APP_ERROR Inference(const std::vector &inputs, std::vector &outputs); + APP_ERROR Process(const std::string &image_path, const InitParam &initParam, std::vector &outputs); + APP_ERROR ReadBin(const std::string &path, std::vector> &dataset); + // get infer time + double GetInferCostMilliSec() const {return inferCostTimeMilliSec;} + + + private: + std::shared_ptr model_TCN; + MxBase::ModelDesc modelDesc_; + uint32_t deviceId_ = 0; + // infer time + double inferCostTimeMilliSec = 0.0; +}; + +#endif diff --git a/research/cv/TCN/infer/mxbase/src/main.cpp b/research/cv/TCN/infer/mxbase/src/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..bae8f45ec1219072ec109c296cde54341fcf2784 --- /dev/null +++ b/research/cv/TCN/infer/mxbase/src/main.cpp @@ -0,0 +1,128 @@ +/* + * Copyright 2022 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include "MxBase/Log/Log.h" +#include "TCN.h" + +namespace { + const uint32_t CLASS_NUM = 10; + const uint32_t BATCH_SIZE = 1; + const std::string resFileName = "../results/eval_mxbase.log"; +} // namespace + +void SplitString(const std::string &s, std::vector *v, const std::string &c) { + std::string::size_type pos1, pos2; + pos2 = s.find(c); + pos1 = 0; + while (std::string::npos != pos2) { + v->push_back(s.substr(pos1, pos2 - pos1)); + + pos1 = pos2 + c.size(); + pos2 = s.find(c, pos1); + } + + if (pos1 != s.length()) { + v->push_back(s.substr(pos1)); + } +} + +APP_ERROR ReadImagesPath(const std::string &path, std::vector *imagesPath, std::vector *imageslabel) { + std::ifstream inFile; + inFile.open(path, std::ios_base::in); + std::string line; + // Check images path file validity + if (inFile.fail()) { + LogError << "Failed to open annotation file: " << path; + return APP_ERR_COMM_OPEN_FAIL; + } + std::vector vectorStr_path; + std::string splitStr_path = ","; + // construct label map + while (std::getline(inFile, line)) { + vectorStr_path.clear(); + + SplitString(line, &vectorStr_path, splitStr_path); + std::string str_path = vectorStr_path[0]; + std::string str_label = vectorStr_path[1]; + imagesPath->push_back(str_path); + int label = str_label[0] - '0'; + imageslabel->push_back(label); + } + + inFile.close(); + return APP_ERR_OK; +} + +int main(int argc, char* argv[]) { + InitParam initParam = {}; + initParam.deviceId = 0; + initParam.checkTensor = true; + initParam.modelPath = "../data/model/TCN.om"; + std::string dataPath = "../data/mnist_infer_data/mnist_bs_1_bin/"; + std::string annoPath = "../data/mnist_infer_data/mnist_bs_1_label.txt"; + + auto model_TCN = std::make_shared(); + APP_ERROR ret = model_TCN->Init(initParam); + if (ret != APP_ERR_OK) { + LogError << "Tagging init failed, ret=" << ret << "."; + return ret; + } + + std::vector imagesPath; + std::vector imageslabel; + ret = ReadImagesPath(annoPath, &imagesPath, &imageslabel); + if (ret != APP_ERR_OK) { + model_TCN->DeInit(); + return ret; + } + + int img_size = imagesPath.size(); + std::vector outputs; + for (int i=0; i < img_size; i++) { + ret = model_TCN->Process(dataPath + imagesPath[i], initParam, outputs); + if (ret !=APP_ERR_OK) { + LogError << "TCN process failed, ret=" << ret << "."; + model_TCN->DeInit(); + return ret; + } + } + float cor = 0; + for (int i = 0; i < img_size; i++) { + int label_now = imageslabel[i]; + if (label_now == outputs[i]) { + cor++; + } + } + + model_TCN->DeInit(); + + double total_time = model_TCN->GetInferCostMilliSec() / 1000; + LogInfo<< "total num: "<< img_size<< ",acc total: "<< static_cast(cor/img_size); + LogInfo<< "inferance total cost time: "<< total_time<< ", FPS: "<< img_size/total_time; + + std::ofstream outfile(resFileName); + if (outfile.fail()) { + LogError << "Failed to open result file: "; + return APP_ERR_COMM_FAILURE; + } + outfile << "total num: "<< img_size<< ",acc total: "<< static_cast(cor/img_size)<< "\n"; + outfile << "inferance total cost time(s): "<< total_time<< ", FPS: "<< img_size/total_time; + outfile.close(); + + return APP_ERR_OK; +} diff --git a/research/cv/TCN/infer/sdk/main.py b/research/cv/TCN/infer/sdk/main.py new file mode 100644 index 0000000000000000000000000000000000000000..dcc282118199f81cb4ade6b99f7a76ea87c73bc4 --- /dev/null +++ b/research/cv/TCN/infer/sdk/main.py @@ -0,0 +1,141 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" main.py """ +import argparse +import os +import json +from StreamManagerApi import StreamManagerApi +from StreamManagerApi import MxDataInput, InProtobufVector, MxProtobufIn +import MxpiDataType_pb2 as MxpiDataType +import numpy as np + + +shape = [1, 1, 784] + +def parse_args(parsers): + """ + Parse commandline arguments. + """ + parsers.add_argument('--images_txt_path', type=str, + default="../data/mnist_infer_data/mnist_bs_1_label.txt", + help='image text') + return parsers + + +def read_file_list(input_file): + """ + :param infer file content: + 1.bin 0 + 2.bin 2 + ... + :return image path list, label list + """ + image_file = [] + labels_l = [] + if not os.path.exists(input_file): + print('input file does not exists.') + with open(input_file, "r") as fs: + for line in fs.readlines(): + line = line.strip('\n').split(',') + file_name = line[0] + label = line[1] + image_file.append(file_name) + labels_l.append(label) + return image_file, labels_l + + +if __name__ == '__main__': + parser = argparse.ArgumentParser( + description='Om TCN Inference') + parser = parse_args(parser) + args, _ = parser.parse_known_args() + # init stream manager + stream_manager = StreamManagerApi() + ret = stream_manager.InitManager() + if ret != 0: + print("Failed to init Stream manager, ret=%s" % str(ret)) + exit() + + # create streams by pipeline config file + with open("../data/config/TCN.pipeline", 'rb') as f: + pipeline = f.read() + ret = stream_manager.CreateMultipleStreams(pipeline) + if ret != 0: + print("Failed to create Stream, ret=%s" % str(ret)) + exit() + + # Construct the input of the stream + + res_dir_name = 'result' + if not os.path.exists(res_dir_name): + os.makedirs(res_dir_name) + + if not os.path.exists("../results"): + os.makedirs("../results") + + file_list, label_list = read_file_list(args.images_txt_path) + + img_size = len(file_list) + results = [] + + for idx, file in enumerate(file_list): + image_path = os.path.join(args.images_txt_path.replace('label.txt', 'bin'), file) + + # Construct the input of the stream + data_input = MxDataInput() + with open(image_path, 'rb') as f: + data = f.read() + data_input.data = data + tensorPackageList1 = MxpiDataType.MxpiTensorPackageList() + tensorPackage1 = tensorPackageList1.tensorPackageVec.add() + tensorVec1 = tensorPackage1.tensorVec.add() + tensorVec1.deviceId = 0 + tensorVec1.memType = 0 + for t in shape: + tensorVec1.tensorShape.append(t) + tensorVec1.dataStr = data_input.data + tensorVec1.tensorDataSize = len(data) + protobufVec1 = InProtobufVector() + protobuf1 = MxProtobufIn() + protobuf1.key = b'appsrc0' + protobuf1.type = b'MxTools.MxpiTensorPackageList' + protobuf1.protobuf = tensorPackageList1.SerializeToString() + protobufVec1.push_back(protobuf1) + + unique_id = stream_manager.SendProtobuf(b'TCN', b'appsrc0', protobufVec1) + + # Obtain the inference result by specifying streamName and uniqueId. + infer_result = stream_manager.GetResult(b'TCN', 0) + if infer_result.errorCode != 0: + print("GetResultWithUniqueId error. errorCode=%d, errorMsg=%s" % ( + infer_result.errorCode, infer_result.data.decode())) + exit() + + res = json.loads(infer_result.data.decode())['MxpiClass'][0]['className'] + + results.append(res) + + results = np.array(results) + labels = np.array(label_list) + np.savetxt("./result/infer_results.txt", results, fmt='%s') + + # destroy streams + stream_manager.DestroyAllStreams() + acc = (results == labels).sum() / img_size + print('total acc:', acc) + + with open("../results/eval_sdk.log", 'w') as f: + f.write('Eval size: {} \n'.format(img_size)) + f.write('total acc: {} \n'.format(acc)) diff --git a/research/cv/TCN/infer/sdk/prec/eval_sdk.py b/research/cv/TCN/infer/sdk/prec/eval_sdk.py new file mode 100644 index 0000000000000000000000000000000000000000..7a90983c2e97a7ee2f09369465301e0f77aa81ef --- /dev/null +++ b/research/cv/TCN/infer/sdk/prec/eval_sdk.py @@ -0,0 +1,49 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" eval_sdk.py """ +import os +import numpy as np + + +def read_file_list(input_f): + """ + :param infer file content: + 1.bin 0 + 2.bin 2 + ... + :return image path list, label list + """ + image_f = [] + labels_l = [] + if not os.path.exists(input_f): + print('input file does not exists.') + with open(input_f, "r") as fs: + for line in fs.readlines(): + line = line.strip('\n').split(',') + file_name = line[0] + label = int(line[1]) + image_f.append(file_name) + labels_l.append(label) + return image_f, labels_l + +images_txt_path = "../data/mnist_infer_data/mnist_bs_1_label.txt" + +file_list, label_list = read_file_list(images_txt_path) +img_size = len(file_list) +labels = np.array(label_list) + +results = np.loadtxt('result/infer_results.txt') +acc = (results == labels).sum() / img_size +print('total acc:', acc) diff --git a/research/cv/TCN/infer/sdk/run.sh b/research/cv/TCN/infer/sdk/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..00dcdb1da7b99aeec76890409844f453f7327605 --- /dev/null +++ b/research/cv/TCN/infer/sdk/run.sh @@ -0,0 +1,27 @@ +#!/bin/bash + +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +set -e + +# Simple log helper functions +info() { echo -e "\033[1;34m[INFO ][MxStream] $1\033[1;37m" ; } +warn() { echo >&2 -e "\033[1;31m[WARN ][MxStream] $1\033[1;37m" ; } + +#to set PYTHONPATH, import the StreamManagerApi.py +export PYTHONPATH=$PYTHONPATH:${MX_SDK_HOME}/python + +python3 main.py +exit 0 \ No newline at end of file diff --git a/research/cv/TCN/modelarts/train_modelarts.py b/research/cv/TCN/modelarts/train_modelarts.py new file mode 100644 index 0000000000000000000000000000000000000000..9830ea431f2a45ab8037c9966a1d34b5f7e8e15c --- /dev/null +++ b/research/cv/TCN/modelarts/train_modelarts.py @@ -0,0 +1,94 @@ +# Copyright 2022 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +""" +######################## train TCN net ######################## +""" + +import numpy as np +import mindspore +import mindspore.nn as nn +from mindspore import context, Tensor, export +from mindspore.common import set_seed +from mindspore.nn.metrics import Accuracy +from mindspore.train import Model +from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor + +from src.dataset import create_dataset +from src.datasetAP import create_datasetAP +from src.eval_call_back import EvalCallBack +from src.loss import NLLLoss +from src.lr_generator import get_lr +from src.metric import MyLoss +from src.model import TCN +from src.model_utils.config import config +from src.model_utils.moxing_adapter import moxing_wrapper + +set_seed(0) + +def modelarts_pre_process(): + pass + + +context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target) + + +@moxing_wrapper(pre_process=modelarts_pre_process) +def train_net(): + """train net""" + if config.dataset_name == 'permuted_mnist': + train_dataset = create_dataset(config.data_url + '/train', config.batch_size) + test_dataset = create_dataset(config.data_url + '/test', config.batch_size) + elif config.dataset_name == 'adding_problem': + train_dataset = create_datasetAP(config.data_url, config.N_train, config.seq_length, 'train', + config.batch_train) + test_dataset = create_datasetAP(config.data_url, config.N_test, config.seq_length, 'test', + config.batch_test) + + net = TCN(config.channel_size, config.num_classes, [config.nhid] * config.level, config.kernel_size, config.dropout + , config.dataset_name) + lr = Tensor(get_lr(config, train_dataset.get_dataset_size()), dtype=mindspore.float32) + net_opt = nn.Adam(net.trainable_params(), learning_rate=lr, weight_decay=config.weight_decay) + + if config.dataset_name == 'permuted_mnist': + net_loss = NLLLoss(reduction='mean') + model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()}) + elif config.dataset_name == 'adding_problem': + net_loss = nn.MSELoss() + model = Model(net, net_loss, net_opt, metrics={"Accuracy": MyLoss()}) + + time_cb = TimeMonitor(data_size=train_dataset.get_dataset_size()) + config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps, + keep_checkpoint_max=config.keep_checkpoint_max) + ckpoint_cb = ModelCheckpoint(prefix="checkpoint_tcn", directory=config.train_url, config=config_ck) + eval_per_epoch = 1 + epoch_per_eval = {"epoch": [], "acc": []} + eval_cb = EvalCallBack(model, test_dataset, eval_per_epoch, epoch_per_eval) + + print("============== Starting Training ==============") + model.train(config.epoch_size, train_dataset, callbacks=[time_cb, ckpoint_cb, LossMonitor(), eval_cb], + dataset_sink_mode=config.dataset_sink_mode) + + print("============ Starting Exporting ===================") + print("============ ckpt_file: " + config.ckpt_file) + # export network + if config.dataset_name == 'permuted_mnist': + inputs = Tensor(np.ones([1, config.channel_size, config.image_height * config.image_width]), + mindspore.float32) + elif config.dataset_name == 'adding_problem': + inputs = Tensor(np.ones([1, config.channel_size, config.seq_length]), mindspore.float32) + export(net, inputs, file_name=config.train_url + '/' + config.file_name, file_format='AIR') + +if __name__ == "__main__": + train_net() diff --git a/research/cv/TCN/requirements.txt b/research/cv/TCN/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..3e76856c4a176faee7fcc882675e30dc9eedd53e --- /dev/null +++ b/research/cv/TCN/requirements.txt @@ -0,0 +1,3 @@ +mindspore +numpy +pyyaml \ No newline at end of file diff --git a/research/cv/TCN/scripts/docker_start.sh b/research/cv/TCN/scripts/docker_start.sh new file mode 100644 index 0000000000000000000000000000000000000000..6b452b3d3f4b1596501ed63d6047052717115c0f --- /dev/null +++ b/research/cv/TCN/scripts/docker_start.sh @@ -0,0 +1,38 @@ +#!/bin/bash +# Copyright (c) 2022. Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +docker_image=$1 +data_dir=$2 +model_dir=$3 + +docker run -it -u root --ipc=host \ + --device=/dev/davinci0 \ + --device=/dev/davinci1 \ + --device=/dev/davinci2 \ + --device=/dev/davinci3 \ + --device=/dev/davinci4 \ + --device=/dev/davinci5 \ + --device=/dev/davinci6 \ + --device=/dev/davinci7 \ + --device=/dev/davinci_manager \ + --device=/dev/devmm_svm \ + --device=/dev/hisi_hdc \ + --privileged \ + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ + -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons \ + -v ${data_dir}:${data_dir} \ + -v ${model_dir}:${model_dir} \ + -v /root/ascend/log:/root/ascend/log ${docker_image} /bin/bash \ No newline at end of file diff --git a/research/cv/TCN/scripts/run_eval_gpu.sh b/research/cv/TCN/scripts/run_eval_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..5fbec5f88e1299b344308115781f235e68374913 --- /dev/null +++ b/research/cv/TCN/scripts/run_eval_gpu.sh @@ -0,0 +1,44 @@ +#!/bin/bash +# Copyright 2021 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# an simple tutorial as follows, more parameters can be setting +if [ $# != 3 ] +then + echo "Usage: bash run_eval_gpu.sh [permuted_mnist|adding_problem] [DATA_PATH] [CKPT_FILE]" +exit 1 +fi + +export DATASET_NAME=$1 +export DATA_PATH=$2 +export CKPT_FILE=$3 + +BASE_PATH=$(cd ./"`dirname $0`" || exit; pwd) + +if [ $# -ge 1 ]; then + if [ $1 == 'adding_problem' ]; then + CONFIG_FILE="../../config_addingproblem.yaml" + elif [ $1 == 'permuted_mnist' ]; then + CONFIG_FILE="../../default_config.yaml" + else + echo "Unrecognized parameter" + exit 1 + fi +else + CONFIG_FILE="../../default_config.yaml" +fi + +python -s ${BASE_PATH}/../eval.py --config_path=$CONFIG_FILE --test_data_path=$DATA_PATH --device_target="GPU" --ckpt_file=$CKPT_FILE > eval.log 2>&1 & + diff --git a/research/cv/TCN/scripts/run_infer_310.sh b/research/cv/TCN/scripts/run_infer_310.sh index 0dec181d2b4d8e4197aba4f5effda0fb5c812b09..227a913a5ef0c872501e044dcb94eb0a72d2c034 100644 --- a/research/cv/TCN/scripts/run_infer_310.sh +++ b/research/cv/TCN/scripts/run_infer_310.sh @@ -67,7 +67,6 @@ if [ -d ${ASCEND_HOME}/ascend-toolkit ]; then export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:${TBE_IMPL_PATH}:$ASCEND_HOME/ascend-toolkit/latest/fwkacllib/python/site-packages:$PYTHONPATH export ASCEND_OPP_PATH=$ASCEND_HOME/ascend-toolkit/latest/opp else - export ASCEND_HOME=/usr/local/Ascend/latest/ export PATH=$ASCEND_HOME/fwkacllib/bin:$ASCEND_HOME/fwkacllib/ccec_compiler/bin:$ASCEND_HOME/atc/ccec_compiler/bin:$ASCEND_HOME/atc/bin:$PATH export LD_LIBRARY_PATH=$ASCEND_HOME/fwkacllib/lib64:/usr/local/lib:$ASCEND_HOME/atc/lib64:$ASCEND_HOME/acllib/lib64:$ASCEND_HOME/driver/lib64:$ASCEND_HOME/add-ons:$LD_LIBRARY_PATH export PYTHONPATH=$ASCEND_HOME/fwkacllib/python/site-packages:$ASCEND_HOME/atc/python/site-packages:$PYTHONPATH @@ -114,7 +113,7 @@ function infer() function cal_acc() { - python3.7 ../postprocess.py --dataset_name=$dataset_name --result_path=./result_Files --label_path=./preprocess_Result/label &> acc.log + python ../postprocess.py --dataset_name=$dataset_name --result_path=./result_Files --label_path=./preprocess_Result/label &> acc.log } diff --git a/research/cv/TCN/scripts/run_standalone_train_gpu.sh b/research/cv/TCN/scripts/run_standalone_train_gpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..e18cdb3dae5f3ebb502a1bd86ee7845c7c5c18fd --- /dev/null +++ b/research/cv/TCN/scripts/run_standalone_train_gpu.sh @@ -0,0 +1 @@ +python train.py --config_path ../../default_config.yaml --train_data_path data/MNIST/train --test_data_path data/MNIST/test --ckpt_path checkpoints > log 2>&1 & \ No newline at end of file diff --git a/research/cv/TCN/src/create_datasetAP.py b/research/cv/TCN/src/create_datasetAP.py index 98020e42234c1bd32d5eec25a4145e8a7aad2f0b..5ca17aee5a71d42e17852e0a6459546f340e727a 100644 --- a/research/cv/TCN/src/create_datasetAP.py +++ b/research/cv/TCN/src/create_datasetAP.py @@ -33,11 +33,11 @@ def create_dataset(N, seq_length, mode='train'): X_mask[i, 0, positions[1]] = 1 Y[i, 0] = X_num[i, 0, positions[0]] + X_num[i, 0, positions[1]] X = np.concatenate((X_num, X_mask), axis=1) - X_path = os.path.join('../data/AddProb/train', 'traindata.bin') - Y_path = os.path.join('../data/AddProb/train', 'trainlabel.bin') + X_path = os.path.join('../data/Preprocessed/train', 'traindata.bin') + Y_path = os.path.join('../data/Preprocessed/train', 'trainlabel.bin') if mode == 'test': - X_path = os.path.join('../data/AddProb/test', 'testdata.bin') - Y_path = os.path.join('../data/AddProb/test', 'testlabel.bin') + X_path = os.path.join('../data/Preprocessed/test', 'testdata.bin') + Y_path = os.path.join('../data/Preprocessed/test', 'testlabel.bin') X.tofile(X_path) Y.tofile(Y_path) diff --git a/research/cv/TCN/src/dataset.py b/research/cv/TCN/src/dataset.py index f68fe3e6db672c622c2fe4e58b20dc6a794efce6..413923a1abfdd0361ca51fb16d50ee71462ee891 100644 --- a/research/cv/TCN/src/dataset.py +++ b/research/cv/TCN/src/dataset.py @@ -1,4 +1,4 @@ -# Copyright 2021 Huawei Technologies Co., Ltd +# Copyright 2021-2022 Huawei Technologies Co., Ltd # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ """ import numpy as np import mindspore.dataset as ds -import mindspore.dataset.transforms.c_transforms as C -import mindspore.dataset.vision.c_transforms as CV +import mindspore.dataset.transforms as C +import mindspore.dataset.vision as CV from mindspore import dtype as mstype np.random.seed(0) diff --git a/research/cv/TCN/src/eval_call_back.py b/research/cv/TCN/src/eval_call_back.py index ee0ddfa888e1510f23321af637cf0044bf044d83..ca93382db39490b9a05e2f2ca7a886a402121cf0 100644 --- a/research/cv/TCN/src/eval_call_back.py +++ b/research/cv/TCN/src/eval_call_back.py @@ -34,4 +34,4 @@ class EvalCallBack(Callback): acc = self.model.eval(self.eval_dataset, dataset_sink_mode=False) self.epoch_per_eval["epoch"].append(cur_epoch) self.epoch_per_eval["acc"].append(acc["Accuracy"]) - print(acc) + print("Eval accuracy:", acc) diff --git a/research/cv/TCN/train.py b/research/cv/TCN/train.py index 2fafa0e0b227ec3ca172c57c3109f7b726742cdf..53746bab8a8bbeb0c81f381bfc16ca2b13ed853b 100644 --- a/research/cv/TCN/train.py +++ b/research/cv/TCN/train.py @@ -39,7 +39,6 @@ set_seed(0) def modelarts_pre_process(): pass - context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target)