To support model development and performance debugging in MindSpore, an easy-to-use profile tool is required to intuitively display the performance information of each dimension of a network model, provide users with easy-to-use and abundant profiling functions, and help users quickly locate network performance faults.
The Profiler architecture design is introduced from the following three aspects: the overall context interaction relationship of Profiler; the internal structure of Profiler, including the module structure and module layers; the interactive calling relationship between modules.
Profiler is a part of the MindSpore debugging and optimization tool. The following figure shows the tool context.
Figure 1 Context relationship
As shown in the preceding figure, the interaction between the Profiler and other components is as follows:
In the training script, MindSpore Profiler is called to send the command to the MindSpore ada(Ascend device) or CUPTI(GPU device) module for starting performance data collection. Finally, the ada or CUPTI generates original performance data.
MindSpore Profiler parses the original data in the user script and generates the intermediate data results in the specified folder.
MindInsight Profiler connects to the intermediate data and provides the visualized Profiler function for users.
Modules are classified into the following layers:
Figure 2 Relationships between modules at different layers
Module functions are as follows:
Users can use API or RESTful to complete internal module interaction process. The following uses the API as an example:
Figure 3 Module interaction
The interaction process of each module is as follows:
ProfilerAPI calls the control function of the lower-layer Controller to control the lower-layer collection module to collect performance information. Currently, the collection module (ada on Ascend or CUPTI on GPU) receives commands and independently collects performance information.
After the training is complete, users call the analysis API of ProfilerAPI.
Profiler API analysis API uses the Parser module to parse performance data, generates intermediate results, calls the Aalayser module to analyze the results, and returns various information to users.
ProfilerAPI provides an entry API in the training script for users to start performance collection and analyze performance data. ProfilerAPI delivers commands through Controller to control the startup of ada/CUPTI.
ProfilerAPI belongs to the API layer of upper-layer application and is integrated by the training script. The function is divided into two parts:
Before training, call the bottom-layer Controller API to deliver a command to start a profiling task.
After training, call the bottom-layer Controller API to deliver commands to stop the profiling task, call the Analyser and Parser APIs to parse data files and generate result data such as operator performance statistics and training trace statistics.
Controller provides an API for the upper layer, calls API of the lower-layer performance collection module, and delivers commands for starting and stopping performance collection.
The generated original performance data includes:
Ascend:
hwts.log.data.45.dev.profiler_default_tag
file: stores operator execution information, including the start and end of a task and stream ID.DATA_PREPROCESS.dev.AICPU
file: specifies AI CPU operator execution time at each stage.Framework.host.task_desc_info
file: stores the mapping between operator IDs and operator names and the input and output information of each operator.training_trace.46.dev.profiler_default_tag
file: stores the start and end time of each step and time of step interval, forward and backward propagation, and step tail.GPU:
step_trace_profiling_0.txt
file: stores the start and end operator of each step.Parser is a module for parsing original performance data which is collected on the device and cannot be directly understood by users. Parser parses, combines, and converts the data to generate intermediate results that can be understood by users and analyzed by upper layers.
Figure 4 Parser module
As shown in the preceding figure, there are HWTS Parser, AI CPU Parser, Framework Parser, and Training Trace Parser modules. Each module parses a type of original data to obtain the intermediate file that can be read by users.
Ascend:
hwts.log.data.45.dev.profiler_default_tag
file to obtain the task-based statistics of the device, such as the start and end of each task and stream ID, which are used to compute the operator execution time.DATA_PREPROCESS.dev.AICPU
file to obtain the AI CPU operator execution time at each stage.Framework.host.task_desc_info
file to obtain the mapping between AI Core operator and task, and key operator information.training_trace.46.dev.profiler_default_tag
file to analyze the time at each training stage.GPU:
step_trace_profiling_0.txt
file to analyze the time at each training stage.Analyzer is used to filter, sort, query, and page the intermediate results generated at the parsing stage.
This module parses the intermediate files generated by Parser, provides a general API for upper-layer data analysis, and returns the analyzed data to the upper layer for display. Various intermediate files have certain common points which can be abstracted. Therefore, following figure shows the design of the Analyser class.
Figure 5 Analyser class
As shown in the preceding figure, multiple Analysers are implemented for different contents to be queried. Filter, sorting, and pagination conditions can be defined for each Analyser. Each Analyser knows which intermediate files are required to merge, filter, and sort data. Analyser is associated with Parser through the intermediate files generated by Parser, and no function is called. In this way, Analyser and Parser are decoupled.
Currently, there are two types of analyzers for operator information:
To hide the internal implementation of Analyser and facilitate calling, the simple factory mode is used to obtain the specified Analyser through AnalyserFactory.
Proposer is a Profiler performance optimization suggestion module. Proposer calls the Analyser module to obtain performance data, analyzes the performance data based on optimization rules, and displays optimization suggestions for users through the UI and API.
Modules are classified into the following layers:
Figure 6 Proposer module
As shown in the preceding figure:
You can call the query API of the Analyser object to obtain information, including the top N operators that are sorted by time and the time information of each training trace stage.
The following figure shows the module class design:
Figure 7 Proposer class
As shown in the preceding figure:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。