From 9b15fe5bafe0d949c99b5faa7050033f3c1ca208 Mon Sep 17 00:00:00 2001 From: zhangyi Date: Tue, 12 Apr 2022 10:07:41 +0800 Subject: [PATCH] modify the links in English files --- .../docs/source_en/quick_start/quick_start.md | 2 +- .../docs/source_en/troubleshooting_guide.md | 2 +- .../docs/source_en/use/converter_train.md | 4 +- .../migrate_3rd_scripts_mindconverter.md | 4 +- .../source_en/performance_tuning_guide.md | 2 +- .../source_zh_cn/accuracy_optimization.md | 2 +- .../design/enable_graph_kernel_fusion.md | 2 +- .../source_en/faq/data_processing.md | 7 +- .../mindspore/source_en/faq/feature_advice.md | 6 +- .../source_en/faq/network_compilation.md | 2 + .../source_en/faq/operators_compile.md | 5 +- .../migration_guide/neural_network_debug.md | 2 +- .../source_en/migration_guide/preparation.md | 2 +- .../source_en/migration_guide/sample_code.md | 4 +- .../migration_guide/use_third_party_op.md | 2 +- .../note/static_graph_syntax_support.md | 2 +- docs/mindspore/source_en/numpy.ipynb | 4 + docs/reinforcement/docs/source_en/dqn.md | 2 +- .../experts/source_en/dataset/augment.md | 4 +- tutorials/experts/source_en/dataset/cache.md | 6 +- tutorials/experts/source_en/dataset/eager.md | 2 +- .../experts/source_en/dataset/optimize.ipynb | 20 +- .../experts/source_en/debug/auto_tune.md | 2 +- .../experts/source_en/debug/custom_debug.md | 2 +- .../source_en/debug/dataset_autotune.md | 2 +- tutorials/experts/source_en/debug/dump.md | 10 +- tutorials/experts/source_en/debug/mindir.md | 6 +- .../experts/source_en/debug/op_compilation.md | 2 +- .../experts/source_en/infer/ascend_310_air.md | 4 +- .../source_en/infer/ascend_310_mindir.md | 4 +- .../source_en/infer/ascend_910_mindir.md | 2 +- .../experts/source_en/infer/cpu_gpu_mindir.md | 4 +- .../experts/source_en/infer/inference.md | 8 +- .../experts/source_en/operation/op_ascend.md | 2 +- .../source_en/operation/op_classification.md | 2 +- .../experts/source_en/operation/op_cpu.md | 2 +- .../experts/source_en/operation/op_custom.md | 2 +- .../experts/source_en/operation/op_gpu.md | 2 +- .../source_en/operation/op_overload.md | 2 +- .../cv_resnet50_second_order_optimizer.md | 4 +- .../source_en/others/gradient_accumulation.md | 2 +- .../source_en/others/mixed_precision.md | 4 +- .../parallel/distributed_inference.md | 8 +- .../source_en/parallel/introduction.md | 2 +- .../experts/source_en/parallel/save_load.md | 2 +- .../source_en/parallel/train_ascend.md | 12 +- .../experts/source_en/parallel/train_gpu.md | 2 +- tutorials/source_en/advanced/index.rst | 4 + .../source_en/advanced/train/save_model.md | 292 ++++++++++++++++++ tutorials/source_en/index.rst | 7 + 50 files changed, 398 insertions(+), 87 deletions(-) create mode 100644 tutorials/source_en/advanced/index.rst create mode 100644 tutorials/source_en/advanced/train/save_model.md diff --git a/docs/lite/docs/source_en/quick_start/quick_start.md b/docs/lite/docs/source_en/quick_start/quick_start.md index d65d0e7cb2..7b545f3428 100644 --- a/docs/lite/docs/source_en/quick_start/quick_start.md +++ b/docs/lite/docs/source_en/quick_start/quick_start.md @@ -30,7 +30,7 @@ In addition, you can use the preset model to perform transfer learning to implem ## Converting a Model -After you retrain a model provided by MindSpore, export the model in the [.mindir format](https://www.mindspore.cn/docs/programming_guide/en/master/save_model.html#export-mindir-model). Use the MindSpore Lite [model conversion tool](https://www.mindspore.cn/lite/docs/en/master/use/converter_tool.html) to convert the .mindir format to a .ms model. +After you retrain a model provided by MindSpore, export the model in the [.mindir format](https://www.mindspore.cn/tutorials/en/master/advanced/train/save.html). Use the MindSpore Lite [model conversion tool](https://www.mindspore.cn/lite/docs/en/master/use/converter_tool.html) to convert the .mindir format to a .ms model. Take the mobilenetv2 model as an example. Execute the following script to convert a model into a MindSpore Lite model for on-device inference. diff --git a/docs/lite/docs/source_en/troubleshooting_guide.md b/docs/lite/docs/source_en/troubleshooting_guide.md index c2aa4d11ac..bf28c8810d 100644 --- a/docs/lite/docs/source_en/troubleshooting_guide.md +++ b/docs/lite/docs/source_en/troubleshooting_guide.md @@ -4,7 +4,7 @@ ## Overview -If you encounter an issue when using MindSpore Lite, you can view logs first. In most scenarios, you can locate the issue based on the error information reported in logs. You can set the environment variable [GLOG_v](https://mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#log-related-environment-variables-and-configurations) to adjust the log level to print more debug logs. The following describes how to locate and rectify common faults. +If you encounter an issue when using MindSpore Lite, you can view logs first. In most scenarios, you can locate the issue based on the error information reported in logs. You can set the environment variable [GLOG_v](https://www.mindspore.cn/tutorials/experts/en/master/debug/custom_debug.html#log-related-environment-variables-and-configurations) to adjust the log level to print more debug logs. The following describes how to locate and rectify common faults. > 1. The log line number may vary in different versions. In the following example, the line number in the error log information is represented by "**". > 2. Only common information is listed in the example logs. Other information related to specific scenarios is displayed as "****". diff --git a/docs/lite/docs/source_en/use/converter_train.md b/docs/lite/docs/source_en/use/converter_train.md index 7cca68bd12..a3521fd9ba 100644 --- a/docs/lite/docs/source_en/use/converter_train.md +++ b/docs/lite/docs/source_en/use/converter_train.md @@ -8,7 +8,7 @@ Creating your MindSpore Lite(Train on Device) model is a two step procedure: -- In the first step the model is defined and the layers that should be trained must be declared. This is being done on the server, using a MindSpore-based [Python code](https://www.mindspore.cn/docs/programming_guide/en/master/save_model.html#export-mindir-model). The model is then exported into a protobuf format, which is called MINDIR. +- In the first step the model is defined and the layers that should be trained must be declared. This is being done on the server, using a MindSpore-based [Python code](https://www.mindspore.cn/tutorials/en/master/advanced/train/save.html). The model is then exported into a protobuf format, which is called MINDIR. - In the seconde step this `.mindir` model is converted into a `.ms` format that can be loaded onto an embedded device and can be trained using the MindSpore Lite framework. The converted `.ms` models can be used for both training and inference. ## Linux Environment @@ -60,4 +60,4 @@ If the command executes successfully, the `model.ms` target file will be obtaine CONVERTER RESULT SUCCESS:0 ``` -If running the conversion command is failed, an errorcode will be output. \ No newline at end of file +If running the conversion command is failed, an errorcode will be output. diff --git a/docs/mindinsight/docs/source_en/migrate_3rd_scripts_mindconverter.md b/docs/mindinsight/docs/source_en/migrate_3rd_scripts_mindconverter.md index cc42355b77..ce620dc7b0 100644 --- a/docs/mindinsight/docs/source_en/migrate_3rd_scripts_mindconverter.md +++ b/docs/mindinsight/docs/source_en/migrate_3rd_scripts_mindconverter.md @@ -228,7 +228,7 @@ Notes: ### Step 2:Migrate the data processing -For a built-in dataset, please query [API mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html) for migration. For a customized dataset and data augmentation, self implementation is recommended. For more data processing migration, please refer to [the programming guidance](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_sample.html). +For a built-in dataset, please query [API mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html) for migration. For a customized dataset and data augmentation, self implementation is recommended. For more data processing migration, please refer to [the programming guidance](https://www.mindspore.cn/tutorials/experts/en/master/dataset/dataset_sample.html). Source codes with PyTorch framework are as follows: @@ -284,7 +284,7 @@ dataset = GeneratorDataset(generator, column_names=['data', 'label']).batch(BATC ### Step 3:Migrate the model training -The loss function(`loss_fn`) can be migrated by querying [API mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html) or user's implementation. For more loss function migration, please refer to [the programming guidance](https://www.mindspore.cn/docs/programming_guide/en/master/loss.html). +The loss function(`loss_fn`) can be migrated by querying [API mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html) or user's implementation. The optimizer(`optimizer`) can be migrated by querying [API mapping](https://www.mindspore.cn/docs/en/master/note/api_mapping/pytorch_api_mapping.html) or user's implementation. For more optimizer migration, please refer to [the programming guidance](https://www.mindspore.cn/docs/en/master/migration_guide/optim.html). diff --git a/docs/mindinsight/docs/source_en/performance_tuning_guide.md b/docs/mindinsight/docs/source_en/performance_tuning_guide.md index c43ce9c0bb..54f38607d6 100644 --- a/docs/mindinsight/docs/source_en/performance_tuning_guide.md +++ b/docs/mindinsight/docs/source_en/performance_tuning_guide.md @@ -185,4 +185,4 @@ Ideally, the time consumed by each stage should be basically the same, otherwise Step 3: Observe the communication time(including the receive operator only) in the cluster step trace page -This indicator reflects the time that the current stage is waiting to receive data from other stages. Theoretically, when the time consumed by each stage is basically the same, there will be no long time-consuming phenomenon in this period. Therefore, after users see this indicator, they can first analyze it according to step 2 to confirm whether there is a problem of excessive time difference between stage. If there is no problem mentioned in step 2, it means that the time consumption is normal and users don't need to pay attention. Users can also go to the `Timeline' page to check the execution sequence of the Receive operator, and analyze the rationality of the time consumption of the operator in combination with their respective networks. \ No newline at end of file +This indicator reflects the time that the current stage is waiting to receive data from other stages. Theoretically, when the time consumed by each stage is basically the same, there will be no long time-consuming phenomenon in this period. Therefore, after users see this indicator, they can first analyze it according to step 2 to confirm whether there is a problem of excessive time difference between stage. If there is no problem mentioned in step 2, it means that the time consumption is normal and users don't need to pay attention. Users can also go to the `Timeline' page to check the execution sequence of the Receive operator, and analyze the rationality of the time consumption of the operator in combination with their respective networks. diff --git a/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md b/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md index f13ee40435..183d709b3a 100644 --- a/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md +++ b/docs/mindinsight/docs/source_zh_cn/accuracy_optimization.md @@ -602,7 +602,7 @@ Xie, Z., Sato, I., & Sugiyama, M. (2020). A Diffusion Theory For Deep Learning D 对数据进行标准化、归一化、通道转换等操作,在图片数据处理上,增加随机视野图片,随机旋转度图片等,另外数据混洗、batch和数据倍增等操作,可参考[数据处理](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/dataset.html)、[数据增强](https://www.mindspore.cn/tutorials/zh-CN/master/advanced/dataset.html)和[自动数据增强](https://www.mindspore.cn/tutorials/experts/zh-CN/master/dataset/augment.html)。 -> 如何将数据增强增强操作应用到自定义数据集中,可以参考[mindspore.dataset.GeneratorDataset.map](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html#mindspore.dataset.GeneratorDataset.map)算子。 +如何将数据增强增强操作应用到自定义数据集中,可以参考[mindspore.dataset.GeneratorDataset.map](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html#mindspore.dataset.GeneratorDataset.map)算子。 ### 超参问题处理 diff --git a/docs/mindspore/source_en/design/enable_graph_kernel_fusion.md b/docs/mindspore/source_en/design/enable_graph_kernel_fusion.md index dcb3268eba..e1ff62fb70 100644 --- a/docs/mindspore/source_en/design/enable_graph_kernel_fusion.md +++ b/docs/mindspore/source_en/design/enable_graph_kernel_fusion.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Model Optimization` - + ## Introduction diff --git a/docs/mindspore/source_en/faq/data_processing.md b/docs/mindspore/source_en/faq/data_processing.md index fb44bac2c8..45d6a11bdb 100644 --- a/docs/mindspore/source_en/faq/data_processing.md +++ b/docs/mindspore/source_en/faq/data_processing.md @@ -34,7 +34,7 @@ A: You can refer to the following steps to reduce CPU consumption (mainly due to
-**Q:  Why there is no difference between the parameter `shuffle` in `GeneratorDataset`, and `shuffle=True` and `shuffle=False` when the task is run? ** +**Q:  Why there is no difference between the parameter `shuffle` in `GeneratorDataset`, and `shuffle=True` and `shuffle=False` when the task is run?** A: If `shuffle` is enabled, the input `Dataset` must support random access (for example, the user-defined `Dataset` has the `getitem` method). If data is returned in `yeild` mode in the user-defined `Dataset`, random access is not supported. For details, see section [Loading Dataset Overview](https://www.mindspore.cn/docs/en/master/faq/data_processing.html#id5) in the tutorial. @@ -158,8 +158,6 @@ A: [build_seg_data.py](https://gitee.com/mindspore/models/blob/master/official/c [GenratorDataset example](https://www.mindspore.cn/docs/en/master/faq/data_processing.html#loading-user-defined-dataset) -[GeneratorDataset API description](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html#mindspore.dataset.GeneratorDataset) -
**Q: When MindSpore performs multi-device training on the Ascend hardware platform, how does the user-defined dataset transfer data to different chip?** @@ -187,7 +185,7 @@ ds.GeneratorDataset(..., num_shards=8, shard_id=7, ...) A: The data schema can be defined as follows:`cv_schema_json = {"label": {"type": "int32", "shape": [-1]}, "data": {"type": "bytes"}}` Note: A label is an array of the numpy type, where label values 1, 1, 0, 1, 0, 1 are stored. These label values correspond to the same data, that is, the binary value of the same image. -For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/docs/programming_guide/en/master/convert_dataset.html#id3). +For details, see [Converting Dataset to MindRecord](https://www.mindspore.cn/tutorials/experts/en/master/dataset/convert_dataset.html#id3).
@@ -234,6 +232,7 @@ A: Firstly, above error refers to failed sending data to the device through the **Q: Can the py_transforms and c_transforms operators be used together? If yes, how should I use them?** A: To ensure high performance, you are not advised to use the py_transforms and c_transforms operators together. For details, see [Image Data Processing and Enhancement](https://www.mindspore.cn/tutorials/en/master/advanced/dataset.html#usage-instructions). However, if the main consideration is to streamline the process, the performance can be compromised more or less. If you cannot use all the c_transforms operators, that is, corresponding certain c_transforms operators are not available, the py_transforms operators can be used instead. In this case, the two operators are used together. + Note that the c_transforms operator usually outputs numpy array, and the py_transforms operator outputs PIL Image. For details, check the operator description. The common method to use them together is as follows: - c_transforms operator + ToPIL operator + py_transforms operator + ToTensor operator diff --git a/docs/mindspore/source_en/faq/feature_advice.md b/docs/mindspore/source_en/faq/feature_advice.md index 19171332b5..b0e67e4776 100644 --- a/docs/mindspore/source_en/faq/feature_advice.md +++ b/docs/mindspore/source_en/faq/feature_advice.md @@ -90,7 +90,7 @@ A: MindSpore provides pluggable device management interface, so that developer c **Q: What is the relationship between MindSpore and ModelArts? Can we use MindSpore in ModelArts?** -A: ModelArts is Huawei public cloud online training and inference platform, and MindSpore is Huawei deep learning framework, which can be found in [MindSpore official website tutorial](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/use_on_the_cloud.html). The tutorial shows how users can use ModelArts to train ModelsSpore models in detail. +A: ModelArts is Huawei public cloud online training and inference platform, and MindSpore is Huawei deep learning framework, which can be found in [MindSpore official website tutorial](https://www.mindspore.cn/tutorials/experts/zh-CN/master/use_on_the_cloud.html). The tutorial shows how users can use ModelArts to train ModelsSpore models in detail.
@@ -144,7 +144,7 @@ A: The TensorFlow's object detection Pipeline API belongs to the TensorFlow's Mo **Q: How do I perform transfer learning in PyNative mode?** -A: PyNative mode is compatible with transfer learning. For more tutorial information, see [Code for Loading a Pre-Trained Model](https://www.mindspore.cn/docs/programming_guide/en/master/cv_mobilenetv2_fine_tune.html#code-for-loading-a-pre-trained-model). +A: PyNative mode is compatible with transfer learning. For more tutorial information, see [Code for Loading a Pre-Trained Model](https://www.mindspore.cn/tutorials/experts/en/master/cv_mobilenetv2_fine_tune.html#code-for-loading-a-pre-trained-model).
@@ -158,4 +158,4 @@ The combination of MindSpore and Ascend is overlapping, and this part of the mod **Q: What is the relationship between Ascend and NPU?** -A: NPU refers to a dedicated processor for neural network algorithms. Different companies have different NPU architectures. Ascend is an NPU processor based on the DaVinci architecture developed by Huawei. \ No newline at end of file +A: NPU refers to a dedicated processor for neural network algorithms. Different companies have different NPU architectures. Ascend is an NPU processor based on the DaVinci architecture developed by Huawei. diff --git a/docs/mindspore/source_en/faq/network_compilation.md b/docs/mindspore/source_en/faq/network_compilation.md index f1ce08a4bb..d1eb4cde6f 100644 --- a/docs/mindspore/source_en/faq/network_compilation.md +++ b/docs/mindspore/source_en/faq/network_compilation.md @@ -317,5 +317,7 @@ If you encounter problems like this one, please remove the use of tensor (bool). **Q: What can I do if encountering an error "The 'setitem' operation does not support the type [List[List[Int642],Int643], Slice[Int64 : Int64 : kMetaTypeNone], Tuple[Int64*3]]"?** A: The MindSpore static graph mode needs to translate the assign operation as the MindSpore operation. + This assign is implemented by the [HyperMap](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_overload.html#multitypefuncgraph) in MindSpore. The Type is not registered in the HyperMap. Since the type inference is an indispensable part of MindSpore, When the front-end compiler expands this assignment operation into a concrete type, it finds that the type is not registered and reports an error. In general, the existing support types will be prompted below. + Users can consider replacing them with other operators, or changing the way the MindSpore source code extends the current Hypermap type [operation overload](https://www.mindspore.cn/tutorials/experts/en/master/operation/op_overload.html#multitypefuncgraph) that MindSpore does not yet support. \ No newline at end of file diff --git a/docs/mindspore/source_en/faq/operators_compile.md b/docs/mindspore/source_en/faq/operators_compile.md index eccf71afc9..908289029d 100644 --- a/docs/mindspore/source_en/faq/operators_compile.md +++ b/docs/mindspore/source_en/faq/operators_compile.md @@ -71,7 +71,7 @@ A: TBE (Tensor Boost Engine) operator is Huawei's self-developed Ascend operator **Q: Has MindSpore implemented the anti-pooling operation similar to `nn.MaxUnpool2d`?** -A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator.html). +A: Currently, MindSpore does not provide anti-pooling APIs but you can customize the operator to implement the operation. For details, refer to [Customize Operators](https://www.mindspore.cn/tutorials/experts/en/master/operation/custom_operator.html).
@@ -99,4 +99,7 @@ A: The `Ascend` backend operators can be divided into AI CORE operators and AI C 2. If the `AI CPU` candidate operator information is not empty, or the candidate operator information of `AI CORE` and `AI CPU` are both not empty, it may be that the given input data type was not in the candidate list and was filtered out in the selection stage. Try to modify the input data type of the operator according to the candidate list. You can select a proper mode and writing method to complete the training by referring to the [official website tutorial](https://www.mindspore.cn/tutorials/experts/en/master/debug/debug_in_pynative_mode.html). +<<<<<<< HEAD +======= +>>>>>>> 9ad49414b... modify the links in English files diff --git a/docs/mindspore/source_en/migration_guide/neural_network_debug.md b/docs/mindspore/source_en/migration_guide/neural_network_debug.md index cea7463576..3a1431e90a 100644 --- a/docs/mindspore/source_en/migration_guide/neural_network_debug.md +++ b/docs/mindspore/source_en/migration_guide/neural_network_debug.md @@ -133,7 +133,7 @@ If the loss errors are large, the problem locating can be done by using followin When the training is finished, metrics can be used to evaluate the training results. MindSpore provides various metrics for evaluation, such as: `accuracy`, `loss`, `precision`, `recall`, `F1`, etc. -- [Reasoning With Training](https://www.mindspore.cn/docs/programming_guide/en/master/evaluate_the_model_during_training.html) +- [Reasoning With Training](https://www.mindspore.cn/tutorials/experts/en/master/evaluate_the_model_during_training.html) Inference can be performed at training time by defining a CallBack function for inference. diff --git a/docs/mindspore/source_en/migration_guide/preparation.md b/docs/mindspore/source_en/migration_guide/preparation.md index cdfbafbad9..4b964641d2 100644 --- a/docs/mindspore/source_en/migration_guide/preparation.md +++ b/docs/mindspore/source_en/migration_guide/preparation.md @@ -50,4 +50,4 @@ Users can read [MindSpore Tutorial](https://www.mindspore.cn/tutorials/experts/e ### Training on the Cloud -ModelArts is a one-stop development platform for AI developers provided by HUAWEI Cloud, which contains Ascend resource pool. Users can experience MindSpore in this platform and read related document [MindSpore use_on_the_cloud](https://www.mindspore.cn/docs/programming_guide/en/master/use_on_the_cloud.html) and [AI Platform ModelArts](https://support.huaweicloud.com/intl/en-us/wtsnew-modelarts/index.html). +ModelArts is a one-stop development platform for AI developers provided by HUAWEI Cloud, which contains Ascend resource pool. Users can experience MindSpore in this platform and read related document [MindSpore use_on_the_cloud](https://www.mindspore.cn/tutorials/experts/en/master/infer/use_on_the_cloud.html) and [AI Platform ModelArts](https://support.huaweicloud.com/intl/en-us/wtsnew-modelarts/index.html). diff --git a/docs/mindspore/source_en/migration_guide/sample_code.md b/docs/mindspore/source_en/migration_guide/sample_code.md index 85eb9b825e..007076cf56 100644 --- a/docs/mindspore/source_en/migration_guide/sample_code.md +++ b/docs/mindspore/source_en/migration_guide/sample_code.md @@ -117,7 +117,7 @@ To understand the implementation of a neural network, it is necessary to know th 3. data processing (e.g. common data slicing, shuffle, data augmentation, etc.). 4. data distribution (distribution of data in batch_size units, distributed training involves multi-machine distribution). -In the process of reading and parsing data, MindSpore provides a more friendly data format - [MindRecord](https://www.mindspore.cn/docs/programming_guide/en/master/convert_dataset.html). Users can convert the dataset in regular format to MindSpore data format, i.e. MindRecord, so that it can be easily loaded into MindSpore for training. At the same time, MindSpore is optimized for performance in some scenarios, and better performance can be obtained by using the MindRecord data format. +In the process of reading and parsing data, MindSpore provides a more friendly data format - [MindRecord](https://www.mindspore.cn/tutorials/experts/en/master/dataset/convert_dataset.html). Users can convert the dataset in regular format to MindSpore data format, i.e. MindRecord, so that it can be easily loaded into MindSpore for training. At the same time, MindSpore is optimized for performance in some scenarios, and better performance can be obtained by using the MindRecord data format. Data processing is usually the most time-consuming phase of data preparation, and most of the operations on data are included in this step, such as Resize, Rescale, Crop, etc. in CV-like networks. MindSpore provides a set of common data processing integration interfaces, which can be called directly by users without implementing them. These integration interfaces not only improve the user-friendliness, but also improve the performance of data preprocessing and reduce the time consumption of data preparation during training. For details, please refer to the [Data Preprocessing Tutorial](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html). @@ -669,7 +669,7 @@ Note: For codes in other files in the directory, refer to MindSpore ModelZoo's [ ### Distributed Training -Distributed training has no impact on the network structure compared to stand-alone training, and can be done by modifying the stand-alone script by calling the distributed training interface provided by MindSpore, as described in [Distributed Training Tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/distributed_training.html). +Distributed training has no impact on the network structure compared to stand-alone training, and can be done by modifying the stand-alone script by calling the distributed training interface provided by MindSpore, as described in [Distributed Training Tutorial](https://www.mindspore.cn/docs/en/master/design/distributed_training_design.html). #### ResNet50 Migration Example diff --git a/docs/mindspore/source_en/migration_guide/use_third_party_op.md b/docs/mindspore/source_en/migration_guide/use_third_party_op.md index 16c691b1c9..1f76f80bca 100644 --- a/docs/mindspore/source_en/migration_guide/use_third_party_op.md +++ b/docs/mindspore/source_en/migration_guide/use_third_party_op.md @@ -216,4 +216,4 @@ op.add_prim_attr("primitive_target", "CPU") > 1. Compile so with cppextension requires a compiler version that meets the tool's needs, and check for the presence of gcc/clang/nvcc. > 2. Compile so with cppextension will generate a build folder in the script path, which stores so. The script will copy so to outside of build, but cppextension will skip compilation if it finds that there is already so in build, so if it is a newly compiled so, remember to empty the so under the build. -> 3. The following tests is based on PyTorch 1.9.1,cuda11.1,python3.7. The download link:. The cuda version supported by PyTorch Aten needs to be consistent with the local cuda version, and whether other versions are supported needs to be explored by the user. \ No newline at end of file +> 3. The following tests is based on PyTorch 1.9.1,cuda11.1,python3.7. The download link:. The cuda version supported by PyTorch Aten needs to be consistent with the local cuda version, and whether other versions are supported needs to be explored by the user. diff --git a/docs/mindspore/source_en/note/static_graph_syntax_support.md b/docs/mindspore/source_en/note/static_graph_syntax_support.md index b9f1478a0c..e9c8d795af 100644 --- a/docs/mindspore/source_en/note/static_graph_syntax_support.md +++ b/docs/mindspore/source_en/note/static_graph_syntax_support.md @@ -395,7 +395,7 @@ For details about the defined `Cell`, click +For details about the definition of `Parameter`: ## Primaries diff --git a/docs/mindspore/source_en/numpy.ipynb b/docs/mindspore/source_en/numpy.ipynb index f47e4239c2..7e0814382c 100644 --- a/docs/mindspore/source_en/numpy.ipynb +++ b/docs/mindspore/source_en/numpy.ipynb @@ -12,7 +12,11 @@ { "cell_type": "markdown", "source": [ +<<<<<<< HEAD "[![Run in ModelArts](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_modelarts_en.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tYXN0ZXIvcHJvZ3JhbW1pbmdfZ3VpZGUvZW4vbWluZHNwb3JlX251bXB5LmlweW5i&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook_en.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/en/mindspore_numpy.ipynb) [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/numpy.ipynb)" +======= + "[![Run in ModelArts](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_modelarts_en.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tYXN0ZXIvcHJvZ3JhbW1pbmdfZ3VpZGUvZW4vbWluZHNwb3JlX251bXB5LmlweW5i&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook_en.png)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/numpy.ipynb) [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/numpy.ipynb)" +>>>>>>> b238ca4e5... modify the links in English files ], "metadata": {} }, diff --git a/docs/reinforcement/docs/source_en/dqn.md b/docs/reinforcement/docs/source_en/dqn.md index 15a99a6f86..38bf433e73 100644 --- a/docs/reinforcement/docs/source_en/dqn.md +++ b/docs/reinforcement/docs/source_en/dqn.md @@ -88,7 +88,7 @@ from mindspore import context context.set_context(mode=context.GRAPH_MODE) ``` -The `GRAPH_MODE` enables functions and methods that are annotated with `@ms_function` to be compiled into the [MindSpore computational graph](https://www.mindspore.cn/docs/programming_guide/en/master/api_structure.html) for auto-parallelisation and acceleration. In this tutorial, we use this feature to implement an efficient `DQNTrainer` class. +The `GRAPH_MODE` enables functions and methods that are annotated with `@ms_function` to be compiled into the [MindSpore computational graph](https://www.mindspore.cn/tutorials/experts/en/master/api_structure.html) for auto-parallelisation and acceleration. In this tutorial, we use this feature to implement an efficient `DQNTrainer` class. ### Defining the DQNTrainer class diff --git a/tutorials/experts/source_en/dataset/augment.md b/tutorials/experts/source_en/dataset/augment.md index 48a8e8cb6e..ca266ea7f0 100644 --- a/tutorials/experts/source_en/dataset/augment.md +++ b/tutorials/experts/source_en/dataset/augment.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Data Preparation` - + ## Overview @@ -10,7 +10,7 @@ MindSpore not only allows you to customize data augmentation, but also provides Auto augmentation can be implemented based on probability or callback parameters. -> For a complete example, see [Application of Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/enable_auto_augmentation.html). +> For a complete example, see [Application of Auto Augmentation](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augment.html). ## Probability Based Auto Augmentation diff --git a/tutorials/experts/source_en/dataset/cache.md b/tutorials/experts/source_en/dataset/cache.md index 6b036bcc8e..891d5d4a51 100644 --- a/tutorials/experts/source_en/dataset/cache.md +++ b/tutorials/experts/source_en/dataset/cache.md @@ -28,7 +28,7 @@ Currently, the cache service supports only single-node cache. That is, the clien > You are advised to cache image data in `decode` + `resize` + `cache` mode. The data processed by `decode` can be directly cached only in single-node single-device mode. -> For a complete example, see [Application of Single-Node Tensor Cache](https://www.mindspore.cn/tutorials/experts/en/master/dataset/enable_cache.html). +> For a complete example, see [Application of Single-Node Tensor Cache](https://www.mindspore.cn/tutorials/experts/en/master/dataset/cache.html). ## Basic Cache Usage @@ -186,7 +186,7 @@ Currently, the cache service supports only single-node cache. That is, the clien Note that you need to create a cache instance for each of the two examples according to step 4, and use the created `test_cache` as the `cache` parameter in the dataset loading operator or map operator. - CIFAR-10 dataset is used in the following two examples. Before running the sample, download and store the CIFAR-10 dataset by referring to [Loading Dataset](https://www.mindspore.cn/tutorials/experts/en/master/dataset/dataset_loading.html#cifar-10-100). + CIFAR-10 dataset is used in the following two examples. Before running the sample, download and store the CIFAR-10 dataset by referring to [Loading Dataset](https://www.mindspore.cn/tutorials/experts/en/master/dataset/cache.html#cifar-10-100). ```text ./datasets/cifar-10-batches-bin @@ -407,7 +407,7 @@ During the single-node multi-device distributed training, the cache operator all 4. Create and apply a cache instance. - CIFAR-10 dataset is used in the following example. Before running the sample, download and store the CIFAR-10 dataset by referring to [Loading Dataset](https://www.mindspore.cn/tutorials/experts/en/master/dataset/dataset_loading.html#cifar-10-100). The directory structure is as follows: + CIFAR-10 dataset is used in the following example. Before running the sample, download and store the CIFAR-10 dataset by referring to [Loading Dataset](https://www.mindspore.cn/tutorials/experts/en/master/dataset/cache.html#cifar-10-100). The directory structure is as follows: ```text ├─cache.sh diff --git a/tutorials/experts/source_en/dataset/eager.md b/tutorials/experts/source_en/dataset/eager.md index 83d3d192b1..c6e071cd8e 100644 --- a/tutorials/experts/source_en/dataset/eager.md +++ b/tutorials/experts/source_en/dataset/eager.md @@ -37,7 +37,7 @@ MindSpore currently supports executing various data augmentations in `Eager mode - Submodule c_transforms, a general-purpose data enhancement operator based on C++. - Submodule py_transforms, a general-purpose data augmentation operator based on Python. -Note: In chapters [Image Processing and Enhancement](https://www.mindspore.cn/tutorials/experts/en/master/dataset/augmentation.html), [Text Processing and Enhancement](https://www.mindspore.cn/tutorials/experts/en/master/dataset/tokenizer.html), all data enhancement operators can be executed in Eager mode. +Note: In chapters [Image Processing and Enhancement](https://www.mindspore.cn/tutorials/experts/en/master/dataset/eager.html), [Text Processing and Enhancement](https://www.mindspore.cn/tutorials/experts/en/master/dataset/eager.html), all data enhancement operators can be executed in Eager mode. ## example diff --git a/tutorials/experts/source_en/dataset/optimize.ipynb b/tutorials/experts/source_en/dataset/optimize.ipynb index e8cffd7796..4c1b7ee447 100644 --- a/tutorials/experts/source_en/dataset/optimize.ipynb +++ b/tutorials/experts/source_en/dataset/optimize.ipynb @@ -7,7 +7,7 @@ "\n", "`Ascend` `GPU` `CPU` `Data Preparation`\n", "\n", - "[![Run in ModelArts](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_modelarts_en.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tYXN0ZXIvcHJvZ3JhbW1pbmdfZ3VpZGUvZW4vbWluZHNwb3JlX29wdGltaXplX2RhdGFfcHJvY2Vzc2luZy5pcHluYg==&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook_en.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/tutorials/experts/en/mindspore_optimize_data_processing.ipynb) [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/optimize_data_processing.ipynb)" + "[![Run in ModelArts](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_modelarts_en.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tYXN0ZXIvcHJvZ3JhbW1pbmdfZ3VpZGUvZW4vbWluZHNwb3JlX29wdGltaXplX2RhdGFfcHJvY2Vzc2luZy5pcHluYg==&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook_en.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/master/tutorials/experts/source_en/dataset/optimize.ipynb) [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_en/dataset/optimize.ipynb)" ], "metadata": {} }, @@ -23,7 +23,7 @@ { "cell_type": "markdown", "source": [ - "![pipeline](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/pipeline.png)" + "![pipeline](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/pipeline_mode_en.png)" ], "metadata": {} }, @@ -201,7 +201,7 @@ { "cell_type": "markdown", "source": [ - "![data-loading-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/data_loading_performance_scheme.png)" + "![data-loading-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/data_loading_performance_scheme.png)" ], "metadata": {} }, @@ -211,7 +211,7 @@ "Suggestions on data loading performance optimization are as follows:\n", "\n", "- Built-in loading operators are preferred for supported dataset formats. For details, see [Built-in Loading Operators](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html), if the performance cannot meet the requirements, use the multi-thread concurrency solution. For details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html#multi-thread-optimization-solution).\n", - "- For a dataset format that is not supported, convert the format to the mindspore data format and then use the `MindDataset` class to load the dataset (Please refer to the [API](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.MindDataset.html) for detailed use). Please refer to [Converting Dataset to MindRecord](https://www.mindspore.cn//tutorials/experts/en/master/dataset/convert_dataset.html), if the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html#multi-thread-optimization-solution).\n", + "- For a dataset format that is not supported, convert the format to the mindspore data format and then use the `MindDataset` class to load the dataset (Please refer to the [API](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.MindDataset.html) for detailed use). Please refer to [Converting Dataset to MindRecord](https://www.mindspore.cn//tutorials/experts/en/master/dataset/optimize.html), if the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html#multi-thread-optimization-solution).\n", "- For dataset formats that are not supported, the user-defined `GeneratorDataset` class is preferred for implementing fast algorithm verification (Please refer to the [API](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html) for detailed use), if the performance cannot meet the requirements, the multi-process concurrency solution can be used. For details, see [Multi-process Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html#multi-process-optimization-solution).\n", "\n", "### Code Example\n", @@ -368,7 +368,7 @@ "source": [ "## Optimizing the Shuffle Performance\n", "\n", - "The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process.Please refer to [shuffle](https://www.mindspore.cn/tutorials/experts/en/master/dataset/pipeline.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html).\n", + "The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process.Please refer to [shuffle](https://www.mindspore.cn/tutorials/experts/en/master/dataset/optimize.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html).\n", "\n", "### Performance Optimization Solution" ], @@ -377,7 +377,7 @@ { "cell_type": "markdown", "source": [ - "![shuffle-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/shuffle_performance_scheme.png)" + "![shuffle-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/shuffle_performance_scheme.png)" ], "metadata": {} }, @@ -520,7 +520,7 @@ "- Use the built-in Python operator (`py_transforms` module) to perform data augmentation.\n", "- Users can define Python functions as needed to perform data augmentation.\n", "\n", - "Please refer to [Data Augmentation](https://www.mindspore.cn/tutorials/expertsen/master/dataset/augmentation.html). The performance varies according to the underlying implementation methods.\n", + "Please refer to [Data Augmentation](https://www.mindspore.cn/tutorials/expertsen/master/dataset/optimize.html). The performance varies according to the underlying implementation methods.\n", "\n", "| Module | Underlying API | Description |\n", "| :----: | :----: | :----: |\n", @@ -534,7 +534,7 @@ { "cell_type": "markdown", "source": [ - "![data-enhancement-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/data_enhancement_performance_scheme.png)" + "![data-enhancement-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/enhancement_performance_scheme.png)" ], "metadata": {} }, @@ -736,7 +736,7 @@ { "cell_type": "markdown", "source": [ - "![compose](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/compose.png)" + "![compose](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/compose.png)" ], "metadata": {} }, @@ -747,7 +747,7 @@ "\n", "Some fusion operators are provided to aggregate the functions of two or more operators into one operator. For details, see [Augmentation Operators](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.vision.html). Compared with the pipelines of their components, such fusion operators provide better performance. As shown in the figure:\n", "\n", - "![operator-fusion](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/images/operator_fusion.png)\n", + "![operator-fusion](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/experts/source_en/dataset/images/operator_fusion.png)\n", "\n", "### Operating System Optimization Solution\n", "\n", diff --git a/tutorials/experts/source_en/debug/auto_tune.md b/tutorials/experts/source_en/debug/auto_tune.md index 60439c9ea5..5031c6d1af 100644 --- a/tutorials/experts/source_en/debug/auto_tune.md +++ b/tutorials/experts/source_en/debug/auto_tune.md @@ -2,7 +2,7 @@ `Ascend` `Model Optimization` -   +   ## Overview diff --git a/tutorials/experts/source_en/debug/custom_debug.md b/tutorials/experts/source_en/debug/custom_debug.md index ed0aba7c73..e9276156d1 100644 --- a/tutorials/experts/source_en/debug/custom_debug.md +++ b/tutorials/experts/source_en/debug/custom_debug.md @@ -2,7 +2,7 @@ `Ascend` `Model Optimization` - + ## Overview diff --git a/tutorials/experts/source_en/debug/dataset_autotune.md b/tutorials/experts/source_en/debug/dataset_autotune.md index d7ff043d52..8b11ce5ebf 100644 --- a/tutorials/experts/source_en/debug/dataset_autotune.md +++ b/tutorials/experts/source_en/debug/dataset_autotune.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `Data Preparation` - + ## Overview diff --git a/tutorials/experts/source_en/debug/dump.md b/tutorials/experts/source_en/debug/dump.md index d89d7ea21b..9223dc7121 100644 --- a/tutorials/experts/source_en/debug/dump.md +++ b/tutorials/experts/source_en/debug/dump.md @@ -2,13 +2,13 @@ `Ascend` `GPU` `CPU` `Model Optimization` - + ## Overview The input and output of the operator can be saved for debugging through the data dump when the training result deviates from the expectation. -- For the dynamic graph mode, MindSpore provides native Python execution capabilities. Users can view and record the corresponding input and output during the running of the network script. For details, see [Use PyNative Mode to Debug](https://www.mindspore.cn/tutorials/experts/en/master/debug/debug_in_pynative_mode.html). +- For the dynamic graph mode, MindSpore provides native Python execution capabilities. Users can view and record the corresponding input and output during the running of the network script. For details, see [Use PyNative Mode to Debug](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html). - For the static graph mode, MindSpore provides the Dump function to save the graph and the input and output data of the operator during model training to a disk file. @@ -49,7 +49,7 @@ If MindInsight is not installed, you need to analyze the data through the follow 1. Analysis of static graph operator results. - Through the IR diagram obtained by the Dump function, you can understand the mapping relationship between the script code and the execution operator (for details, see [MindSpore IR Introduction](https://www.mindspore.cn/tutorials/experts/en/master/design/mindir.html#overview)). Combining the input and output data of the execution operator, it is possible to analyze possible overflow, gradient explosion and disappearance during the training process, and backtrack to the code that may have problems in the script. + Through the IR diagram obtained by the Dump function, you can understand the mapping relationship between the script code and the execution operator (for details, see [MindSpore IR Introduction](https://www.mindspore.cn/mindinsight/docs/en/master/debugger_offline.html)). Combining the input and output data of the execution operator, it is possible to analyze possible overflow, gradient explosion and disappearance during the training process, and backtrack to the code that may have problems in the script. 2. Analysis of the feature map. @@ -105,7 +105,7 @@ The configuration files required for different modes and the data format of dump - `iteration`: Specify the iterations to dump, type is string. Use "|" to separate the step data of different intervals to be saved. For example, "0 | 5-8 | 100-120" represents dump the data of the 1st, 6th to 9th, and 101st to 121st steps. If iteration set to "all", data of every iteration will be dumped. - `saved_data`: Specify what data is to be dumped, type is string. Use "tensor" to dump tensor data, use "statistic" to dump tensor statistics, use "full" to dump both tensor data and statistics. Default setting is "tensor". Synchronous statistics dump is only supported on GPU, using "statistic" or "full" on CPU or Ascend will result in exception. - `input_output`: 0: dump input and output of kernel, 1:dump input of kernel, 2:dump output of kernel. This configuration parameter only supports Ascend and CPU, and GPU can only dump the output of operator. - - `kernels`: List of operator names. Turn on the IR save switch `context.set_context(save_graphs=True)` and execute the network to obtain the operator name from the generated `trace_code_graph_{graph_id}`IR file. For details, please refer to [Saving IR](https://www.mindspore.cn/tutorials/experts/en/master/design/mindir.html#saving-ir). + - `kernels`: List of operator names. Turn on the IR save switch `context.set_context(save_graphs=True)` and execute the network to obtain the operator name from the generated `trace_code_graph_{graph_id}`IR file. For details, please refer to [Saving IR](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#saving-ir). - `support_device`: Supported devices, default setting is `[0,1,2,3,4,5,6,7]`. You can specify specific device ids to dump specific device data. This configuration parameter is invalid on the CPU, because there is no concept of device on the CPU, but it is still need to reserve this parameter in the json file. - `enable`: When set to true, enable Synchronous Dump. When set to false, asynchronous dump will be used on Ascend and synchronous dump will still be used on GPU. - `trans_flag`: Enable trans flag. Transform the device data format into NCHW. If it is `True`, the data will be saved in the 4D format (NCHW) format on the Host side; if it is `False`, the data format on the Device side will be retained. This configuration parameter is invalid on the CPU, because there is no format conversion on the CPU, but it is still need to reserve this parameter in the json file. @@ -407,7 +407,7 @@ Large networks (such as Bert Large) will cause memory overflow when using synchr - `iteration`: Specify the iterations to dump, type is string. Use "|" to separate the step data of different intervals to be saved. For example, "0 | 5-8 | 100-120" represents dump the data of the 1st, 6th to 9th, and 101st to 121st steps. If iteration set to "all", data of every iteration will be dumped. - `saved_data`: Specify what data is to be dumped, type is string. Use "tensor" to dump tensor data, use "statistic" to dump tensor statistics, use "full" to dump both tensor data and statistics. Default setting is "tensor". Asynchronous statistics dump is only supported when `file_format` is set to `npy`, using "statistic" or "full" when `file_format` is set to `bin` will result in exception. - `input_output`: When set to 0, it means to Dump the operator's input and output; setting it to 1 means to Dump the operator's input; setting it to 2 means to Dump the output of the operator. - - `kernels`: List of operator names. Turn on the IR save switch `context.set_context(save_graphs=True)` and execute the network to obtain the operator name from the generated `trace_code_graph_{graph_id}`IR file. `kernels` only supports TBE operator, AiCPU operator and communication operator. The data of communication operation input operator will be dumped if `kernels` is set to the name of communication operator. For details, please refer to [Saving IR](https://www.mindspore.cn/tutorials/experts/en/master/design/mindir.html#saving-ir). + - `kernels`: List of operator names. Turn on the IR save switch `context.set_context(save_graphs=True)` and execute the network to obtain the operator name from the generated `trace_code_graph_{graph_id}`IR file. `kernels` only supports TBE operator, AiCPU operator and communication operator. The data of communication operation input operator will be dumped if `kernels` is set to the name of communication operator. For details, please refer to [Saving IR](https://www.mindspore.cn/tutorials/experts/en/master/debug/dump.html#saving-ir). - `support_device`: Supported devices, default setting is `[0,1,2,3,4,5,6,7]`. You can specify specific device ids to dump specific device data. - `enable`: Enable Asynchronous Dump. If synchronous dump and asynchronous dump are enabled at the same time, only synchronous dump will take effect. - `op_debug_mode`: Reserved field, set to 0. diff --git a/tutorials/experts/source_en/debug/mindir.md b/tutorials/experts/source_en/debug/mindir.md index 9ee840fae6..f40c65a45f 100644 --- a/tutorials/experts/source_en/debug/mindir.md +++ b/tutorials/experts/source_en/debug/mindir.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Model Optimization` - + ## Overview @@ -172,7 +172,7 @@ Line 5 to 6 are the input list, which is in the format of `%para[No.]_[name] : < Line 8 tells us the number of subgraph parsed by the network. There are 3 graphs in this IR. Line 42 is the entry graph `1_construct_wrapper.21`. Line 32 is graph `3_func.23`, parsed from the `func(x, y)` in the source script. Line 12 is graph `2_construct.22`, parsed from the function `construct`. Taking graph `2_construct.22` as an example, Line 10 to 28 indicate the graph structure, which contains several nodes, namely, `CNode`. In this example, there are `Sub`, `Add`, `Mul`. They are defined in the function `__init__`. Line 19 calls a graph by `call @3_func.23`. It indicates calling the graph `func(x, y)` to execute a division operation. -The ]`CNode`]() information format is as follows: including the node name, attribute, input node, the specs of the inputs and outputs, and source code parsing call stack. The ANF graph is a unidirectional acyclic graph. So, the connection between nodes is displayed only based on the input relationship. The corresponding source code reflects the relationship between the `CNode` and the script source code. For example, line 15 is parsed from `a = self.sub(x, 1)`. +The ]`CNode`]() information format is as follows: including the node name, attribute, input node, the specs of the inputs and outputs, and source code parsing call stack. The ANF graph is a unidirectional acyclic graph. So, the connection between nodes is displayed only based on the input relationship. The corresponding source code reflects the relationship between the `CNode` and the script source code. For example, line 15 is parsed from `a = self.sub(x, 1)`. ```text %[No.]([debug_name]) = [op_name]([arg], ...) primitive_attrs: {[key]: [value], ...} @@ -262,7 +262,7 @@ Line 23 to 32 indicates the graph structure, which contains several nodes, namel Line 34 to 39 shows the execution order of the `CNode` from graph `2_construct.22`, corresponding to the order of code execution. The information format is: `No.: belonging graph:node name{[0]: the first input, [1]: the second input, ...}`. For `CNode`, the first input indicates how to compute for this `CNode`. Line 28 indicates the number of graphs. Here is 3. -The [CNode](https://www.mindspore.cn/tutorials/experts/en/master/design/mindir.html#syntax) information format is as follows: including the node name, attribute, input node, output information, format and the corresponding source code. +The [CNode](https://www.mindspore.cn/tutorials/experts/en/master/debug/mindir.html#syntax) information format is as follows: including the node name, attribute, input node, output information, format and the corresponding source code. ```text %[No,] : [outputs' Spec] = [op_name]{[prim_type]}[attr0, attr1, ...](arg0, arg1, ...) #(inputs' Spec)#[scope] diff --git a/tutorials/experts/source_en/debug/op_compilation.md b/tutorials/experts/source_en/debug/op_compilation.md index e373b55c52..08e7f32082 100644 --- a/tutorials/experts/source_en/debug/op_compilation.md +++ b/tutorials/experts/source_en/debug/op_compilation.md @@ -2,7 +2,7 @@ `Ascend` `Model Optimization` - + ## Overview diff --git a/tutorials/experts/source_en/infer/ascend_310_air.md b/tutorials/experts/source_en/infer/ascend_310_air.md index 83e3aab942..7724279c5c 100644 --- a/tutorials/experts/source_en/infer/ascend_310_air.md +++ b/tutorials/experts/source_en/infer/ascend_310_air.md @@ -2,7 +2,7 @@ `Ascend` `Inference Application` - + ## Overview @@ -102,7 +102,7 @@ Create a directory to store the inference code project, for example, `/home/HwHi ## Exporting the AIR Model -Train the target network on the Ascend 910 AI Processor, save it as a checkpoint file, and export the model file in AIR format through the network and checkpoint file. For details about the export process, see [Export AIR Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/save_model.html#export-air-model). +Train the target network on the Ascend 910 AI Processor, save it as a checkpoint file, and export the model file in AIR format through the network and checkpoint file. For details about the export process, see [Export AIR Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/ascend_310_air.html#export-air-model). > The [resnet50_export.air](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com:443/sample_resources/acl_resnet50_sample/resnet50_export.air) is a sample AIR file exported using the ResNet-50 model. diff --git a/tutorials/experts/source_en/infer/ascend_310_mindir.md b/tutorials/experts/source_en/infer/ascend_310_mindir.md index 6ec4a6ca57..4aa1fe29a0 100644 --- a/tutorials/experts/source_en/infer/ascend_310_mindir.md +++ b/tutorials/experts/source_en/infer/ascend_310_mindir.md @@ -2,7 +2,7 @@ `Ascend` `Inference Application` - + ## Overview @@ -22,7 +22,7 @@ Refer to [Installation Guide](https://www.mindspore.cn/install/en) to install As ## Exporting the MindIR Model -Train the target network on the CPU/GPU/Ascend 910 AI Processor, save it as a checkpoint file, and export the model file in MindIR format through the network and checkpoint file. For details about the export process, see [Export MindIR Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/save_model.html#export-mindir-model). +Train the target network on the CPU/GPU/Ascend 910 AI Processor, save it as a checkpoint file, and export the model file in MindIR format through the network and checkpoint file. For details about the export process, see [Export MindIR Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/ascend_310_mindir.html#export-mindir-model). > The [resnet50_imagenet.mindir](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/sample_resources/ascend310_resnet50_preprocess_sample/resnet50_imagenet.mindir) is a sample MindIR file exported using the ResNet-50 model, whose BatchSize is 1. We also provide a ResNet-50 MindIR with data preprocess [resnet50_imagenet_preprocess.mindir](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/sample_resources/ascend310_resnet50_preprocess_sample/resnet50_imagenet_preprocess.mindir). diff --git a/tutorials/experts/source_en/infer/ascend_910_mindir.md b/tutorials/experts/source_en/infer/ascend_910_mindir.md index 55d8d599e8..3f0499bb97 100644 --- a/tutorials/experts/source_en/infer/ascend_910_mindir.md +++ b/tutorials/experts/source_en/infer/ascend_910_mindir.md @@ -2,7 +2,7 @@ `Ascend` `Inference Application` - + ## Overview diff --git a/tutorials/experts/source_en/infer/cpu_gpu_mindir.md b/tutorials/experts/source_en/infer/cpu_gpu_mindir.md index ebfa10e542..aa102f1e8a 100644 --- a/tutorials/experts/source_en/infer/cpu_gpu_mindir.md +++ b/tutorials/experts/source_en/infer/cpu_gpu_mindir.md @@ -2,7 +2,7 @@ `GPU` `Inference Application` - + ## Use C++ Interface to Load a MindIR File for Inferencing @@ -181,6 +181,6 @@ It is recommended that export the MindIR model with fp32 precision mode before d ## Inference Using an ONNX File -1. Generate a model in ONNX format on the training platform. For details, see [Export ONNX Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/save_model.html#export-onnx-model). +1. Generate a model in ONNX format on the training platform. For details, see [Export ONNX Model](https://www.mindspore.cn/tutorials/experts/en/master/infer/cpu_gpu_mindir.html#export-onnx-model). 2. Perform inference on a GPU by referring to the runtime or SDK document. For example, use TensorRT to perform inference on the NVIDIA GPU. For details, see [TensorRT backend for ONNX](https://github.com/onnx/onnx-tensorrt). diff --git a/tutorials/experts/source_en/infer/inference.md b/tutorials/experts/source_en/infer/inference.md index 6ead9fffd1..d91323ec15 100644 --- a/tutorials/experts/source_en/infer/inference.md +++ b/tutorials/experts/source_en/infer/inference.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Inference Application` - + MindSpore can execute inference tasks on different hardware platforms based on trained models. @@ -35,11 +35,11 @@ Inference can be classified into the following two modes based on the applicatio 1. Local inference - Load a checkpoint file generated during network training and call the `model.predict` API for inference and validation. For details, see [Online Inference with Checkpoint](https://www.mindspore.cn/tutorials/experts/en/master/infer/online_inference.html). + Load a checkpoint file generated during network training and call the `model.predict` API for inference and validation. For details, see [Online Inference with Checkpoint](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). 2. Cross-platform inference - Use a network definition and a checkpoint file, call the `export` API to export a model file, and perform inference on different platforms. Currently, MindIR, ONNX, and AIR (on only Ascend AI Processors) models can be exported. For details, see [Saving Models](https://www.mindspore.cn/tutorials/experts/en/master/infer/save_model.html). + Use a network definition and a checkpoint file, call the `export` API to export a model file, and perform inference on different platforms. Currently, MindIR, ONNX, and AIR (on only Ascend AI Processors) models can be exported. For details, see [Saving Models](https://www.mindspore.cn/tutorials/experts/en/master/infer/inference.html). ## Introduction to MindIR @@ -56,4 +56,4 @@ MindSpore defines logical network structures and operator attributes through a u 2. Application Scenarios - Use a network definition and a checkpoint file to export a MindIR model file, and then execute inference based on different requirements, for example, [Inference Using the MindIR Model on Ascend 310 AI Processors](https://www.mindspore.cn/tutorials/experts/en/master/infer/multi_platform_inference_ascend_310_mindir.html), [MindSpore Serving-based Inference Service Deployment](https://www.mindspore.cn/serving/docs/en/master/serving_example.html), and [Inference on Devices](https://www.mindspore.cn/lite/docs/en/master/index.html). + Use a network definition and a checkpoint file to export a MindIR model file, and then execute inference based on different requirements, for example, [Inference Using the MindIR Model on Ascend 310 AI Processors](https://www.mindspore.cn/tutorials/experts/en/master/infer/ascend_310_mindir.html), [MindSpore Serving-based Inference Service Deployment](https://www.mindspore.cn/serving/docs/en/master/serving_example.html), and [Inference on Devices](https://www.mindspore.cn/lite/docs/en/master/index.html). diff --git a/tutorials/experts/source_en/operation/op_ascend.md b/tutorials/experts/source_en/operation/op_ascend.md index 81d0d259c3..97c1b9fe76 100644 --- a/tutorials/experts/source_en/operation/op_ascend.md +++ b/tutorials/experts/source_en/operation/op_ascend.md @@ -2,7 +2,7 @@ `Ascend` `Model Development` - + ## Overview diff --git a/tutorials/experts/source_en/operation/op_classification.md b/tutorials/experts/source_en/operation/op_classification.md index a62842c7a2..21316c774b 100644 --- a/tutorials/experts/source_en/operation/op_classification.md +++ b/tutorials/experts/source_en/operation/op_classification.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Beginner` - + ## Overview diff --git a/tutorials/experts/source_en/operation/op_cpu.md b/tutorials/experts/source_en/operation/op_cpu.md index e04d393a54..1c9a576cac 100644 --- a/tutorials/experts/source_en/operation/op_cpu.md +++ b/tutorials/experts/source_en/operation/op_cpu.md @@ -4,7 +4,7 @@ Translator: [JuLyAi](https://gitee.com/julyai) `CPU` `Model Development` - + ## Overview diff --git a/tutorials/experts/source_en/operation/op_custom.md b/tutorials/experts/source_en/operation/op_custom.md index 080f159f0c..4b93d570f9 100644 --- a/tutorials/experts/source_en/operation/op_custom.md +++ b/tutorials/experts/source_en/operation/op_custom.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Model Development` - + ## Overview diff --git a/tutorials/experts/source_en/operation/op_gpu.md b/tutorials/experts/source_en/operation/op_gpu.md index 040a8dc365..8edc7e3236 100644 --- a/tutorials/experts/source_en/operation/op_gpu.md +++ b/tutorials/experts/source_en/operation/op_gpu.md @@ -4,7 +4,7 @@ Translator: [Leon_02](https://gitee.com/Leon_02) `GPU` `Model Development` - + ## Overview diff --git a/tutorials/experts/source_en/operation/op_overload.md b/tutorials/experts/source_en/operation/op_overload.md index 3bbfb40cc8..c7bab00577 100644 --- a/tutorials/experts/source_en/operation/op_overload.md +++ b/tutorials/experts/source_en/operation/op_overload.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `CPU` `Model Development` - + ## Overview diff --git a/tutorials/experts/source_en/others/cv_resnet50_second_order_optimizer.md b/tutorials/experts/source_en/others/cv_resnet50_second_order_optimizer.md index 987c41b0da..a0f2dfdaf1 100644 --- a/tutorials/experts/source_en/others/cv_resnet50_second_order_optimizer.md +++ b/tutorials/experts/source_en/others/cv_resnet50_second_order_optimizer.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `Function Extension` `Whole Process` -   +   ## Overview @@ -169,7 +169,7 @@ def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target= return data_set ``` -> MindSpore supports multiple data processing and augmentation operations. These operations are usually used in combination. For details, see [Data Processing](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_sample.html). +> MindSpore supports multiple data processing and augmentation operations. These operations are usually used in combination. For details, see [Data Processing](https://www.mindspore.cn/tutorials/experts/en/master/others/cv_resnet50_second_order_optimizer.html). ## Defining the Network diff --git a/tutorials/experts/source_en/others/gradient_accumulation.md b/tutorials/experts/source_en/others/gradient_accumulation.md index 709590fb7b..31fbb00e71 100644 --- a/tutorials/experts/source_en/others/gradient_accumulation.md +++ b/tutorials/experts/source_en/others/gradient_accumulation.md @@ -2,7 +2,7 @@ `GPU` `Model Optimization` - + ## Overview diff --git a/tutorials/experts/source_en/others/mixed_precision.md b/tutorials/experts/source_en/others/mixed_precision.md index 0e46aee9f0..fe5ced6b23 100644 --- a/tutorials/experts/source_en/others/mixed_precision.md +++ b/tutorials/experts/source_en/others/mixed_precision.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `Model Optimization` - + ## Overview @@ -87,7 +87,7 @@ To use the automatic mixed-precision, you need to call the `Model` API to transf 2. Define a network: This step is the same as that for defining a common network (no new configuration is required). -3. Create a dataset: For details, see [Quick Start of Dataset](https://www.mindspore.cn/tutorials/experts/en/master/others/dataset_sample.html). +3. Create a dataset: For details, see [Quick Start of Dataset](https://www.mindspore.cn/tutorials/experts/en/master/others/mixed_precision.html). 4. Use the `Model` API to encapsulate the network model, optimizer, and loss function, and set the `amp_level` parameter. For details, see [MindSpore API](https://www.mindspore.cn/docs/en/master/api_python/mindspore.html#mindspore.Model). In this step, MindSpore automatically selects an appropriate operator to convert FP32 to FP16. diff --git a/tutorials/experts/source_en/parallel/distributed_inference.md b/tutorials/experts/source_en/parallel/distributed_inference.md index c005fe6ce0..6f2b4b464d 100644 --- a/tutorials/experts/source_en/parallel/distributed_inference.md +++ b/tutorials/experts/source_en/parallel/distributed_inference.md @@ -22,7 +22,7 @@ The process of distributed inference is as follows: > - In the distributed Inference scenario, during the training phase, the `integrated_save` of `CheckpointConfig` interface should be set to `False`, which means that each device only saves the slice of model instead of the full model. > - `parallel_mode` of `set_auto_parallel_context` interface should be set to `auto_parallel` or `semi_auto_parallel`. > - In addition, you need to specify `strategy_ckpt_save_file` to indicate the path of the strategy file. - > - If pipeline distributed inference is used, then the pipeline parallel training also must be used. And the `device_num` and `pipeline_stages` used for pipeline training and inference must be the same. While applying pipeline inference, `micro_size` is 1 and there is no need to use `PipelineCell`. The pipeline distributed training tutorial can be referred the link: . + > - If pipeline distributed inference is used, then the pipeline parallel training also must be used. And the `device_num` and `pipeline_stages` used for pipeline training and inference must be the same. While applying pipeline inference, `micro_size` is 1 and there is no need to use `PipelineCell`. The pipeline distributed training tutorial can be referred the link: . 2. Set context and infer predication strategy according to the predication data. @@ -72,7 +72,7 @@ For Multi-card training and distributed reasoning, it is necessary to export Min First, you need to prepare checkpoint files and training strategy files. -The checkpoint file is generated during the training process. For specific usage of checkpoint, please refer to: [checkpoint usage](https://www.mindspore.cn/tutorials/experts/en/master/parallel/save_model.html#checkpoint). +The checkpoint file is generated during the training process. For specific usage of checkpoint, please refer to: [checkpoint usage](https://www.mindspore.cn/tutorials/experts/en/master/parallel/distributed_inference.html#checkpoint). The training strategy file needs to be generated by setting the context during training. The context configuration items are as follows: `context.set_auto_parallel_context(strategy_ckpt_save_file='train_strategy.ckpt')` @@ -87,7 +87,7 @@ Then, use the method of loading distributed checkpoints to load the previously t code show as below: `load_distributed_checkpoint(model, ckpt_file_list, predict_strategy)` -For the specific usage of `load_distributed_checkpoint`, please refer to: [Distributed Inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/multi_platform_inference_ascend_910.html#distributed-inference-with-multi-devices). +For the specific usage of `load_distributed_checkpoint`, please refer to: [Distributed Inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/distributed_inference.html#distributed-inference-with-multi-devices). Finally, you can export the MindIR file in the distributed reasoning scenario. @@ -109,7 +109,7 @@ load_distributed_checkpoint(model, ckpt_file_list, predict_strategy) export(net, Tensor(input), file_name='net', file_format='MINDIR') ``` -In the case of multi-card training and single-card inference, the usage of exporting MindIR is the same as that of single machine. For the usage of loading checkpoint, please refer to: [Distributed Inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/multi_platform_inference_ascend_910.html#ascend-910-ai). +In the case of multi-card training and single-card inference, the usage of exporting MindIR is the same as that of single machine. For the usage of loading checkpoint, please refer to: [Distributed Inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/distributed_inference.html#ascend-910-ai). > Distributed scene export MindIR file sample code: > diff --git a/tutorials/experts/source_en/parallel/introduction.md b/tutorials/experts/source_en/parallel/introduction.md index 1dfa28b479..d0581485f7 100644 --- a/tutorials/experts/source_en/parallel/introduction.md +++ b/tutorials/experts/source_en/parallel/introduction.md @@ -2,4 +2,4 @@ No English version available right now, welcome to contribute. - + diff --git a/tutorials/experts/source_en/parallel/save_load.md b/tutorials/experts/source_en/parallel/save_load.md index 3cdac740ad..5dfaaf2946 100644 --- a/tutorials/experts/source_en/parallel/save_load.md +++ b/tutorials/experts/source_en/parallel/save_load.md @@ -2,7 +2,7 @@ `Ascend` `GPU` `Distributed Parallel` `Model Export` `Model Loading` - + ## Overview diff --git a/tutorials/experts/source_en/parallel/train_ascend.md b/tutorials/experts/source_en/parallel/train_ascend.md index 7058b67043..aaf14cb872 100644 --- a/tutorials/experts/source_en/parallel/train_ascend.md +++ b/tutorials/experts/source_en/parallel/train_ascend.md @@ -2,7 +2,7 @@ `Ascend` `Distributed Parallel` `Whole Process` - + ## Overview @@ -294,7 +294,7 @@ The `Momentum` optimizer is used as the parameter update tool. The definition is - `gradients_mean`: During backward computation, the framework collects gradients of parameters in data parallel mode across multiple hosts, obtains the global gradient value, and transfers the global gradient value to the optimizer for update. The default value is `False`, which indicates that the `AllReduce.Sum` operation is applied. The value `True` indicates that the `AllReduce.Mean` operation is applied. - You are advised to set `device_num` and `global_rank` to their default values. The framework calls the HCCL API to obtain the values. -> More about the distributed training configurations please refer to the [programming guide](https://www.mindspore.cn/tutorials/experts/en/master/parallel/auto_parallel.html). +> More about the distributed training configurations please refer to the [programming guide](https://www.mindspore.cn/tutorials/experts/en/master/parallel/train_ascend.html). If multiple network cases exist in the script, call `context.reset_auto_parallel_context` to restore all parameters to default values before executing the next case. @@ -593,7 +593,7 @@ param_dict = load_checkpoint(pretrain_ckpt_path) load_param_into_net(net, param_dict) ``` -For checkpoint configuration policy and saving method, please refer to [Saving and Loading Model Parameters](https://www.mindspore.cn/tutorials/experts/en/master/parallel/save_model.html#checkpoint-configuration-policies). +For checkpoint configuration policy and saving method, please refer to [Saving and Loading Model Parameters](https://www.mindspore.cn/tutorials/experts/en/master/parallel/train_ascend.html#checkpoint-configuration-policies). By default, sliced parameters would be merged before saving automatocally. However, considering large-scaled networks, a large size checkpoint file will be difficult to be transferred and loaded. So every device can save sliced parameters separately by setting `integrated_save` as `False` in `CheckpointConfig`. If the shard strategies of retraining or inference are different with that of training, the special loading way is needed. @@ -621,7 +621,7 @@ load_distributed_checkpoint(model.train_network, ckpt_file_list, layout_dict) model.train(2, dataset) ``` -> Distributed inference could be referred to [Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/multi_platform_inference_ascend_910.html#id1). +> Distributed inference could be referred to [Distributed inference](https://www.mindspore.cn/tutorials/experts/en/master/parallel/train_ascend.html#id1). ### Data Parallel Mode @@ -680,8 +680,8 @@ to: ckpt_config = CheckpointConfig(keep_checkpoint_max=1, integrated_save=False) ``` -It should be noted that if users choose this checkpoint saving policy, users need to save and load the segmented checkpoint for subsequent reasoning or retraining. Specific usage can refer to [Integrating the Saved Checkpoint Files](https://www.mindspore.cn/tutorials/experts/en/master/parallel/save_load_model_hybrid_parallel.html#integrating-the-saved-checkpoint-files). +It should be noted that if users choose this checkpoint saving policy, users need to save and load the segmented checkpoint for subsequent reasoning or retraining. Specific usage can refer to [Integrating the Saved Checkpoint Files](https://www.mindspore.cn/tutorials/experts/en/master/parallel/train_ascend.html#integrating-the-saved-checkpoint-files). ### Hybrid Parallel Mode -For model parameter saving and loading in Hybrid Parallel Mode, please refer to [Saving and Loading Model Parameters in the Hybrid Parallel Scenario](https://www.mindspore.cn/tutorials/experts/en/master/parallel/save_load_model_hybrid_parallel.html). +For model parameter saving and loading in Hybrid Parallel Mode, please refer to [Saving and Loading Model Parameters in the Hybrid Parallel Scenario](https://www.mindspore.cn/tutorials/experts/en/master/parallel/train_ascend.html). diff --git a/tutorials/experts/source_en/parallel/train_gpu.md b/tutorials/experts/source_en/parallel/train_gpu.md index 447d9cf0f4..952eb8ce6e 100644 --- a/tutorials/experts/source_en/parallel/train_gpu.md +++ b/tutorials/experts/source_en/parallel/train_gpu.md @@ -2,7 +2,7 @@ `GPU` `Distributed Parallel` `Whole Process` - + ## Overview diff --git a/tutorials/source_en/advanced/index.rst b/tutorials/source_en/advanced/index.rst new file mode 100644 index 0000000000..a998828c9d --- /dev/null +++ b/tutorials/source_en/advanced/index.rst @@ -0,0 +1,4 @@ +.. toctree:: + :maxdepth: 1 + + train/save_model \ No newline at end of file diff --git a/tutorials/source_en/advanced/train/save_model.md b/tutorials/source_en/advanced/train/save_model.md new file mode 100644 index 0000000000..90d861786f --- /dev/null +++ b/tutorials/source_en/advanced/train/save_model.md @@ -0,0 +1,292 @@ +# Saving and Exporting Models + +`Ascend` `GPU` `CPU` `Model Export` + + + +## Overview + +During model training, you can add CheckPoints to save model parameters for inference and retraining after interruption. If you want to perform inference on different hardware platforms, you need to generate corresponding MindIR, AIR and ONNX format files based on the network and CheckPoint format files. + +- **CheckPoint**: The Protocol Buffers mechanism is adopted, which stores all the parameter values in the network. It is generally used to resume training after a training task is interrupted, or in a Fine Tune task after training. +- **MindIR**: MindSpore IR, is a kind of functional IR based on graph representation of MindSpore, which defines the extensible graph structure and the IR representation of the operator, and stores the network structure and weight parameter values. It eliminates model differences between different backends and is generally used to perform inference tasks across hardware platforms, such as performing inference on the Ascend 910 trained model on the Ascend 310, GPU, and MindSpore Lite side. +- **AIR**: Ascend Intermediate Representation, is an open file format defined by Huawei for machine learning, and stores network structure and weight parameter values, which can better adapt to Ascend AI processors. It is generally used to perform inference tasks on Ascend 310. +- **ONNX**: Open Neural Network Exchange, is an open file format designed for machine learning, storing both network structure and weight parameter values. Typically used for model migration between different frameworks or for use on the Inference Engine (TensorRT). + +The following uses examples to describe how to save MindSpore CheckPoint files, and how to export MindIR, AIR and ONNX files. + +## Saving the models + +The [Save and Load section](https://mindspore.cn/tutorials/zh-CN/master/beginner/save_load.html) of the beginner tutorials describes how to save model parameters directly using `save_checkpoint` and using the Callback mechanism to save model parameters during training. This section further describes how to save model parameters during training and use `save_checkpoint` save model parameters directly. + +### Savig the models during the training + +Saving model parameters during training. MindSpore provides two saving strategies, an iteration policy and a time policy, which can be set by creating a `CheckpointConfig` object. The iteration policy and the time policy cannot be used at the same time, where the iterative policy takes precedence over the time policy, and when set at the same time, only iteration policy can take effect. When the parameter display is set to None, the policy is abandoned. In addition, when an exception occurs during training, MindSpore also provides a breakpoint retrain function, that is, the system will automatically save the CheckPoint file when the exception occurs. + +1. Iteration policy + +`CheckpointConfig` can be configured according to the number of iterations, and the parameters of the iteration policy are as follows: + +- `save_checkpoint_steps`: indicates how many CheckPoint files are saved every step, with a default value of 1. +- `keep_checkpoint_max`: indicates how many CheckPoint files to save at most, with a default value of 5. + +```python +from mindspore.train.callback import CheckpointConfig + +# Save one CheckPoint file every 32 steps, and up to 10 CheckPoint files +config_ck = CheckpointConfig(save_checkpoint_steps=32, keep_checkpoint_max=10) +``` + +In the case that the iteration policy script ends normally, the CheckPoint file of the last step is saved by default. + +2. Time policy + +`CheckpointConfig` can be configured according to the training duration, and the parameters of the configuration time policy are as follows: + +- `save_checkpoint_seconds`: indicates how many seconds to save a CheckPoint file, with a default value of 0. +- `keep_checkpoint_per_n_minutes`: indicates how many checkPoint files are kept every few minutes, with a default value of 0. + +```python +from mindspore.train.callback import CheckpointConfig + +# Save a CheckPoint file every 30 seconds and a CheckPoint file every 3 minutes +config_ck = CheckpointConfig(save_checkpoint_seconds=30, keep_checkpoint_per_n_minutes=3) +``` + +`save_checkpoint_seconds` parameters cannot be used with `save_checkpoint_steps` parameters. If both parameters are set, the `save_checkpoint_seconds` parameters are invalid. + +3. Breakpoint renewal + +MindSpore provides a breakpoint renewal function, when the user turns on the function, if an exception occurs during training, MindSpore will automatically save the CheckPoint file (end-of-life CheckPoint) when the exception occurred. The function of breakpoint renewal is controlled by the `exception_save` parameter (bool type) in CheckpointConfig, which is turned on when set to True, and closed by False, which defaults to False. The end-of-life CheckPoint file saved by the breakpoint continuation function does not affect the CheckPoint saved in the normal process, and the naming mechanism and save path are consistent with the normal process settings, the only difference is that the '_breakpoint' will be added at the end of the end of the CheckPoint file name to distinguish. Its usage is as follows: + +```python +from mindspore.train.callback import ModelCheckpoint, CheckpointConfig + +# Configure the breakpoint continuation function to turn on +config_ck = CheckpointConfig(save_checkpoint_steps=32, keep_checkpoint_max=10, exception_save=True) +``` + +If an exception occurs during training, the end-of-life CheckPoint is automatically saved, and if an exception occurs in the 10th step of the 10th epoch in the training, the saved end-of-life CheckPoint file is as follows. + +```python +resnet50-10_10_breakpoint.ckpt # The end-of-life CheckPoint file name will be marked by '_breakpoint' to distinguish it from the normal process checkPoint. +``` + +### save_checkpoint saving models + +You can use `save_checkpoint` function to save network weights to a CheckPoint file, and the common parameters are as follows: + +- `save_obj`: Cell object or data list. +- `ckpt_file_name`: Checkpoint file name. If the file already exists, the original file will be overwritten. +- `integrated_save`: Whether to merge or save split Tensor in parallel scenarios. The default value is True. +- `async_save`: Whether to execute asynchronously save the checkpoint file. The default value is False. +- `append_dict`: Additional information that needs to be saved. The key of dict must be of type str, and the value type of dict must be float or bool. The default value is None. + +1. `save_obj` parameter + +The [Save and Load section](https://mindspore.cn/tutorials/zh-CN/master/beginner/save_load.html) of the beginner tutorials describes how to save model parameters directly using `save_checkpoint` when `save_obj` is a Cell object. Here's how to save model parameters when you pass in a list of data. When passing in a data list, each element of the list is of dictionary type, such as [{"name": param_name, "data": param_data} ,...], `param_name` type must be str, and the type of `param_data` must be Parameter or Tensor. An example is shown below: + +```python +from mindspore import save_checkpoint, Tensor +from mindspore import dtype as mstype + +save_list = [{"name": "lr", "data": Tensor(0.01, mstype.float32)}, {"name": "train_epoch", "data": Tensor(20, mstype.int32)}] +save_checkpoint(save_list, "hyper_param.ckpt") +``` + +2. `integrated_save` parameter + +indicates whether the parameters are saved in a merge, and the default is True. In the model parallel scenario, Tensor is split into programs run by different cards. If integrated_save is set to True, these split Tensors are merged and saved in each checkpoint file, so that the checkpoint file saves the complete training parameters. + +```python +save_checkpoint(net, "resnet50-2_32.ckpt", integrated_save=True) +``` + +3. `async_save` parameter + +indicates whether the asynchronous save function is enabled, which defaults to False. If set to True, multithreading is turned on to write checkpoint files, allowing training and save tasks to be performed in parallel, saving the total time the script runs when training large-scale networks. + +```python +save_checkpoint(net, "resnet50-2_32.ckpt", async_save=True) +``` + +4. `append_dict` parameter + +additional information needs to be saved, the type is dict type, and currently only supports the preservation of basic types, including int, float, bool, etc + +```python +save_dict = {"epoch_num": 2, "lr": 0.01} +# In addition to the parameters in net, the information save_dict is also saved in the ckpt file +save_checkpoint(net, "resnet50-2_32.ckpt",append_dict=save_dict) +``` + +## Transfer Learning + +In the transfer learning scenario, when using a pre-trained model for training, the model parameters in the CheckPoint file cannot be used directly, and they need to be modified according to the actual situation to be suitable for the current network model. This section describes how to remove the fully connected layer parameter from a pre-trained model for Resnet50. + +First download the [pre-trained model of Resnet50](https://download.mindspore.cn/vision/classification/resnet50_224.ckpt), which is trained on the ImageNet dataset by the `resnet50` model in MindSpore Vision. + +The training model is loaded using the `load_checkpoint` interface, which returns a Ditt type, the dictionary's key is the name of each layer of the network, the type is the character Type Str; the value, dictionary value is the parameter value of the network layer, and the type is Parameter. + +In the following example, since the number of classification classes of the Resnet50 pre-trained model is 1000, and the number of classification classes of the resnet50 network defined in the example is 2, the fully connected layer parameter in the pre-trained model needs to be deleted. + +```python +from mindvision.classification.models import resnet50 +from mindspore import load_checkpoint, load_param_into_net +from mindvision.dataset import DownLoad + +# Download the pre-trained model for Resnet50 +dl = DownLoad() +dl.download_url('https://download.mindspore.cn/vision/classification/resnet50_224.ckpt') +# Define a resnet50 network with a classification class of 2 +resnet = resnet50(2) +# Model parameters are saved to the param_dict +param_dict = load_checkpoint("resnet50_224.ckpt") + +# Get a list of parameter names for the fully connected layer +param_filter = [x.name for x in resnet.head.get_parameters()] + +def filter_ckpt_parameter(origin_dict, param_filter): + """Delete elements including param_filter parameter names in the origin_dict""" + for key in list(origin_dict.keys()): # Get all parameter names for the model + for name in param_filter: # Iterate over the parameter names in the model to be deleted + if name in key: + print("Delete parameter from checkpoint:", key) + del origin_dict[key] + break + +# Delete the full connection layer +filter_ckpt_parameter(param_dict, param_filter) + +# Prints the updated model parameters +load_param_into_net(resnet, param_dict) +``` + +```text +Delete parameter from checkpoint: head.dense.weight +Delete parameter from checkpoint: head.dense.bias +``` + +## Model Export + +MindSpore's `export` can export network models as files in a specified format for inference on other hardware platforms. The main parameters of `export` are as follows: + +- `net`: MindSpore network structure. +- `inputs`: The input of the network, the supported input type is Tensor. When there are multiple inputs, they need to be passed in together, such as `export(network, Tensor(input1), Tensor(input2), file_name='network', file_format='MINDIR')`. +- `file_name`: Export the file name of the model, if the `file_name` does not contain the corresponding suffix name (such as .mindir), the system will automatically add a suffix to the file name after setting the `file_format`. +- `file_format`: MindSpore currently supports exporting models in "AIR", "ONNX" and "MINDIR" formats. + +The following describes the use of `export` to generate corresponding MindIR, AIR and ONNX format files for the resnet50 network and the corresponding CheckPoint format files. + +### Export MindIR Model + +If you want to perform inference across platforms or hardware (Ascend AI processor, MindSpore on-device, GPU, etc.), you can generate the corresponding MindIR format model file through the network definition and CheckPoint. MindIR format file can be applied to MindSpore Lite. Currently, it supports inference network based on static graph mode. The following is to use the `resnet50` model in MindSpore Vision and the model file resnet50_224.ckpt trained by the model on the ImageNet dataset to export the MindIR format file. + +```python +import numpy as np +from mindspore import Tensor, export, load_checkpoint +from mindvision.classification.models import resnet50 + +resnet = resnet50(1000) +load_checkpoint("resnet50_224.ckpt", net=resnet) + +input_np = np.random.uniform(0.0, 1.0, size=[1, 3, 224, 224]).astype(np.float32) + +# Export the file resnet50_224.mindir to the current folder +export(resnet, Tensor(input_np), file_name='resnet50_224', file_format='MINDIR') +``` + +If you wish to save the data preprocess operations into MindIR and use them to perform inference, you can pass the Dataset object into export method: + +```python +import mindspore.dataset as ds +import mindspore.dataset.vision.c_transforms as C +from mindspore import export, load_checkpoint +from mindvision.classification.models import resnet50 +from mindvision.dataset import DownLoad + +def create_dataset_for_renset(path): + """Create a dataset""" + data_set = ds.ImageFolderDataset(path) + mean = [0.485 * 255, 0.456 * 255, 0.406 * 255] + std = [0.229 * 255, 0.224 * 255, 0.225 * 255] + data_set = data_set.map(operations=[C.Decode(), C.Resize(256), C.CenterCrop(224), + C.Normalize(mean=mean, std=std), C.HWC2CHW()], input_columns="image") + data_set = data_set.batch(1) + return data_set + +dataset_url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/beginner/DogCroissants.zip" +path = "./datasets" +# Download and extract the dataset +dl = DownLoad() +dl.download_and_extract_archive(url=dataset_url, download_path=path) +# Load the dataset +path = "./datasets/DogCroissants/val/" +de_dataset = create_dataset_for_renset(path) +# Define the network +resnet = resnet50() + +# Load the preprocessing model parameters into the network +load_checkpoint("resnet50_224.ckpt", net=resnet) +# Export a MindIR file with preprocessing information +export(resnet, de_dataset, file_name='resnet50_224', file_format='MINDIR') +``` + +> - If `file_name` does not contain the ".mindir" suffix, the system will automatically add the ".mindir" suffix to it. + +In order to avoid the hardware limitation of Protocol Buffers, when the exported model parameter size exceeds 1G, the framework will save the network structure and parameters separately by default. + +- The name of the network structure file ends with the user-specified prefix plus _graph.mindir. +- In the same level directory, there will be a folder with user-specified prefix plus _variables, which stores network parameters. + And when the parameter's data size exceeds 1T, it will split to another file named data_1, data_2, etc. + +Taking the above code as an example, if the parameter size with the model exceeds 1G, the generated directory structure is as follows: + +```text +├── resnet50_224_graph.mindir +└── resnet50_224_variables + ├── data_1 + ├── data_2 + └── data_3 +``` + +### Export AIR Model + +If you want to perform inference on the Ascend AI processor, you can also generate the corresponding AIR format model file through the network definition and CheckPoint. The following is to use the `resnet50` model in MindSpore Vision and the model file trained by the model on the ImageNet dataset resnet50_224.ckpt, and export the AIR format file on the Ascend AI processor. + +```python +import numpy as np +from mindspore import Tensor, export, load_checkpoint +from mindvision.classification.models import resnet50 + +resnet = resnet50() +# Load parameters into the network +load_checkpoint("resnet50_224.ckpt", net=resnet) +# Network input +input_np = np.random.uniform(0.0, 1.0, size=[1, 3, 224, 224]).astype(np.float32) +# Save the resnet50_224.air file to the current directory +export(resnet, Tensor(input_np), file_name='resnet50_224', file_format='AIR') +``` + +If file_name does not contain the ".air" suffix, the system will automatically add the ".air" suffix to it. + +### Export ONNX Model + +When you have a CheckPoint file, if you want to do inference on Ascend AI processor, GPU, or CPU, you need to generate ONNX models based on the network and CheckPoint. The following is to use the `resnet50` model in MindSpore Vision and the model file trained by the model on the ImageNet dataset resnet50_224.ckpt, and export the ONNX format file. + +```python +import numpy as np +from mindspore import Tensor, export, load_checkpoint +from mindvision.classification.models import resnet50 + +resnet = resnet50() +load_checkpoint("resnet50_224.ckpt", net=resnet) + +input_np = np.random.uniform(0.0, 1.0, size=[1, 3, 224, 224]).astype(np.float32) + +# Save the resnet50_224.onnx file to the current directory +export(resnet, Tensor(input_np), file_name='resnet50_224', file_format='ONNX') +``` + +> - If `file_name` does not contain the ".onnx" suffix, the system will automatically add the ".onnx" suffix to it. +> - Currently, only the ONNX format export of ResNet series networks, YOLOV3, YOLOV4 and BERT are supported. diff --git a/tutorials/source_en/index.rst b/tutorials/source_en/index.rst index 4d6a8e712e..bcbae15785 100644 --- a/tutorials/source_en/index.rst +++ b/tutorials/source_en/index.rst @@ -20,3 +20,10 @@ MindSpore Tutorial beginner/train beginner/save_load beginner/infer + +.. toctree:: + :glob: + :maxdepth: 1 + :caption: Advanced + + advanced/train \ No newline at end of file -- Gitee