diff --git a/ada/doc/ada_install.md b/ada/doc/ada_install.md new file mode 100644 index 0000000000000000000000000000000000000000..9a7d86146961aa26b981cf92d1e7e108a181cffb --- /dev/null +++ b/ada/doc/ada_install.md @@ -0,0 +1,59 @@ +# 使用 ada 安装 run包 + +## 注意!!如果使用 venv 安装了 ada + +如果你使用 virtualenv 安装了 ada,需要额外注意的是,不可以在 venv 中直接运行 ada: + +```shell +(venv) $ ada ... +``` + +这是当前 run 包安装的一个坑,如果在 venv 下安装 run 包,会导致 run 包中的 python packages 被安装到 venv 的 path 下。 +然而在执行训练/推理脚本时,会从`/usr/local/Ascend` 也就是 run 包的安装目录下面查找 python package,这就导致了执行脚本时找不到对应的 python 包。 + +因此,如果使用 venv 安装了 ada,最佳实践是使用如下方法执行 ada: + +在`/usr/local/bin` 或者任何一个在执行 PATH 的目录中新增一个 ada 文件,里面的内容如下所示: + +```shell +#! /bin/bash +/path/to/venv/bin/python -m ada_cmd $* +``` + +退出 venv 正常执行 ada,此时实际会执行此ada脚本,从而规避上述问题。 + +## 安装 newest + +在调试环境上,使用如下命令安装 atc,acllib,fwkacllib,opp 共四个 newest 包: + +```shell +# ada -n atc,acllib,fwkacllib,opp -i +``` + +输入以上命令后,`ada` 会从CI上查找最新的 newest 包,并将其下载到当前目录,然后执行安装,其回显如下所示: + +```shell + +``` + +## 安装自编译的包 + +`ada` 也可以被用来安装自己编译的包,例如自行编译了atc 和 opp 两个包,并希望做安装,命令示意为: + +```shell +# ada -c 20211009_162917537_If6be636,atc,opp -i +``` + +需要注意的是,此时指定包的选项变为`-c`,并且后面跟的参数中,第一个参数代表了自编译包的路径,这个路径可以通过如下两种方式获得: + +**方式1,通过编译包的路径** + +打开编译包所在的CI页面,网址中如下一段便是此名字: + +![](res/compile_package_page.PNG) + +**方式2,通过编译时间推断** + +在CI的构建历史页面中,会显示一个构建时间,此时间拼接上 change-id 的前8位,便是这个名字: + +![](res/compile_package_history.PNG) diff --git a/ada/doc/ada_pa.md b/ada/doc/ada_pa.md new file mode 100644 index 0000000000000000000000000000000000000000..d0612cb075f7a62d6d2aa2463b6b18f34255a373 --- /dev/null +++ b/ada/doc/ada_pa.md @@ -0,0 +1,96 @@ +# 解析profiling文件 + +## 打开GE的profiling功能 + +要打开GE的profiling功能,需要做两件事情 + +1、环境变量,通过设置如下环境变量,打开Dump到命令行的能力: + +```bash +export GE_PROFILING_TO_STD_OUT=1 +``` + +2、打开通用profiling + +完成环境变量配置后,使用**资料中profiling的打开方法,来打开profiling即可**,下面是各种场景的举例: + +**在Pytorch训练场景**,打开通用profiling的方法为,在脚本中添加如下代码 + +```python +with torch.npu.profile('./'): + # 在此scope中的代码会统计profiling,退出此scope时,GE会将profiling数据打印到标准输出中 +``` + +**在TensorFlow训练场景**,在option中加如下配置: + +```python +custom_op.parameter_map["profiling_mode"].b = True +custom_op.parameter_map["profiling_options"].s = tf.compat.as_bytes('{"output":"../profiling", "storage_limit":"200MB","training_trace":"on","l2":"on","hccl":"on","task_trace":"on","aicpu":"on","fp_point":"","bp_point":"","aic_metrics":"PipeUtilization","msproftx":"on"}') +``` + +**如果你使用了[msame](https://gitee.com/ascend/tools/tree/master/msame) 推理工具**,可以通过指定`--profiler true`打开profiling: + +```bash +./msame.x86 ...很多选项... --profiler true +``` + +TODO: 添加在ACL推理场景下的使用说明 + +***注意!!! 一旦设置了第1步中的环境变量,profiling的正常功能会被屏蔽,仅会生效本文档中描述的profiling功能!! +若想使用昇腾官方的profiling功能,请取消第1步中设置的环境变量。*** + +GE profiling结果会通过标准输出打印: + +```bash +...其他输出 + +...更多profiling打印 +Profiling dump end +...其他输出 +``` + +为了不影响执行效率,GE profiling信息不会实时地打印到标准输出。 +profiling开启的过程中,GE会一直在内存中缓存profiling数据。在**profiling被关闭时**,数据被打印到标准输出。 +以Pytorch脚本场景为例,在脚本中每调用一次`with torch.npu.profile('./')`语句,在`with`结束后,都会在标准输出看到一段类似的打印。 + +## 解析profiling + +保存好上一步打印的标准输出内容后,可以使用`ada-pa` 命令解析: + +```shell +$ ada-pa /path/to/out_file.txt +Dump to file /path/to/out_file__i.json +Dump to ... +``` + +完成解析后,结果文件会默认写到与源文件同目录下。解析结果文件的命名规则为:`file_name__.`,其中: + + +* result-type: 结果类型,当前支持tracing与summary两种,后续可能添加新的结果类型 +* index: 第几次dump的结果,对应脚本中的第几次`with`语句,从0开始 +* ext: 后缀名,不同类型的结果可能后缀不同,不做赘述 + +解析出的tracing文件可以在`chrome://tracing` 中打开,效果图如下所示: + +![](res/ada_pa_tracing.PNG) + +summary文件可以使用excel打开,效果如下所示: + +![](res/ada_pa_summary.PNG) + +表头中几个字段的含义为: + +* event:事件类型,不解释 +* count:该事件发生了多少次 +* avg(us):该事件的平均耗时,用us表达 +* total(us):该事件的总耗时,用us表达 +* w-percent:该事件耗时加权平均后的占比,计算公式为该事件的total耗时除以`OpExecute`事件的total耗时 + +op statistic文件可以使用excel打开,效果如下所示: + +![](res/ada_pa_op_stat.PNG) + +表头中几个字段的含义为: +* name:node name +* event:事件类型 +* duration(us):耗时,用us表达 \ No newline at end of file diff --git a/ada/doc/ada_pa_dev.md b/ada/doc/ada_pa_dev.md new file mode 100644 index 0000000000000000000000000000000000000000..281d52100baff2678155ae21ba817cb5855a2cd2 --- /dev/null +++ b/ada/doc/ada_pa_dev.md @@ -0,0 +1,107 @@ +# 开发者文档 + +## 整体流程 + +ada-pa接受一个GE profiling的log输出文件,并将其解析为一系列的分析报告文件,包含tracing、summary等。整体来说,ada-pa分为两大步: + +```mermaid +flowchart LR + log[原log文件]-->ana((Analyse))-->pd[ProfilingData]-->r1(("tracing report"))-->ret1["tracing.json"] + pd-->r2(("summary report"))-->ret2["summary.csv"] + pd-->rn(("xxx report"))-->retn["xxx result"] +``` + +第一步的analyse将原始的log文件解析成一个中间的Python对象,基于这个易于操作的python对象,后面的各类reporter可以做对应的分析,生成所需要的分析报告。 + +### analyse + +analyse会解析log文件,生成**一到多个**ProfilingData,analyse对接GE profiling的数据格式,对其做解析,丢弃非法数据,将解析好的数据放到ProfilingData中,ProfilingData主要包含两类内容: + +* records: 所有的原始记录,原始记录的定义详见下文 + +* event_records:将records做加工后的event_records数据,event_record与record的区别是,event_record中包含开始与结束时间戳,而record中只有一个时间戳,另外,record会有一个et字段,表达本record表达开始还是结束 + +举例说明,原始的log文件中,通过如下方式描述一个事件: + +```textile +10000 122080 [GatherV2] [ConstPrepare] Start +50000 122080 [GatherV2] [ConstPrepare] End +``` + +每一列的含义分别为:时间戳、线程id(tid)、node_name,event,et(Start或者End) + +上述log文件会被解析为两个record,每个record的结构为: + +```python +class Record: + def __init__(self): + self.node_name: str = None + self.event: str = None + self.et: str = None + self.tid: int = None + self.timestamp: int = None + self.iteration: str = None + self.device: str = DEV_HOST_CPU +``` + +例如上述两条record公共描述了一个event,这个event从时间戳10000开始,运行至50000结束,经历了40000ns + +上述两条对应的record,便可以表达成一个event_record,event_record中不再有et和timestamp,而是将其替换为start和end,因此event_record的结构为: + +```python +class EventRecord(Record): + def __init__(self): + self.node_name: str = None + self.event: str = None + self.tid: int = None + self.iteration: str = None + self.device: str = DEV_HOST_CPU() + self.start: int = None + self.end: int = None +``` + +大部分reporter只需要看event就足够了,在analyser中完成这一步解析可以大大简化reporter的解析难度。 + +### report + +reporter解析接受一组ProfilingData(后面简称pds),将其解析,并输出想要的内容。因为pds会被传入多个reporter使用,为了防止互相影响,所以reporter不可以修改传入的pds。 + +每个reporter都是一个类,这个类的构造函数接受pds,并提供一个report函数。 + +```python +class Reporter: + def __init__(self, pds:[ProfilingData]): + # 一般来说,构造函数中,reporter需要将pds保存下来,其他流程按需进行 + pass + + + def report(self, path): + """report函数,将解析好的数据生成文件,写到path路径下, + 入参path是个目录名/前缀,reporter需要自己生成前缀后面的个性化名字,最终的目录+文件名形式为: + 目录名/前缀__. + 文件名的格式 + """ + pass +``` + +## TODO + +* op_statistic性能较差,主要是在confirm_all_events_node_name中较慢,需要优化 +* analyzer当前不支持同一线程中,time range交叉的异常场景检测(所谓交叉,是指对于两个事件A和B,有A-start > B-start && A-end < B-end>) +* 同一种事件发生嵌套时,在生成summary时,会把嵌套的内层和外层全部统计在内,而这么做有可能是不准的,详见[事件嵌套的时间统计问题]("事件嵌套的时间统计问题") +* 算子内带atomic场景,atomic的tiling会被提现到上层tiling中,上层算子的kernellaunch会包含两次子KernelLaunch,其中第一次为atomic,第二次为本算子的KernelLaunch,当前会把这两次KernelLaunch都统计为上层算子的KernelLaunch中 + +## 已知问题详述 + +### 事件嵌套的时间统计问题 + +在出现如下的事件时: + +```textile +|---------transdata Tiling---------| +|--None Tiling--|--Atomic Tiling --| +``` + +其对应的场景为,Transdata为多task算子,在子图视角,认为进入了Transdata节点的tiling,在node视角,需要分两次下发不同的task,对两个task需要分别作tiling,分别是Transdata分身的tiling和额外需要下发的atomic算子的tiling。 + +例如下面的tiling中, None tiling(也就是transdata自己的tiling)耗时10u,atomic tiling耗时10u,外层tiling总耗时20u。那么在计算tiling的summary时,正确的结果是做了2次tiling,每次耗时10u,总耗时20u。但是当前的统计结果会变成:做了3次tiling,总耗时40u,平均每次耗时13.3u \ No newline at end of file diff --git a/ada/onetime/ot_gen_from_pt_profiling.py b/ada/onetime/ot_gen_from_pt_profiling.py new file mode 100644 index 0000000000000000000000000000000000000000..4525b9fa31890ef735d9781a762a0876562d564f --- /dev/null +++ b/ada/onetime/ot_gen_from_pt_profiling.py @@ -0,0 +1,36 @@ +import json +import logging +from collections import OrderedDict + + +def main_npu(path): + with open(path, 'r') as f: + records = json.load(f) + + for rec in records: + if not rec.get('name', '').startswith("aclopCompileAndExecute: "): + continue + op_type = rec['name'].replace("aclopCompileAndExecute: ", '') + time = rec.get('dur', None) + if time is None: + logging.warning("Skip op {} does not contains duration time".format(op_type)) + continue + print("{}, {}".format(op_type, time)) + + +class GpuProfiling: + def __init__(self): + self.records: OrderedDict = OrderedDict() + + + def merge(self, rec): + self.records.g + + +def main_gpu(path): + with open(path, 'r') as f: + records = json.load(f) + + +if __name__ == "__main__": + main_npu(r'E:\Downloads\espace\step90_batchsize512_npu -gpu-aclge.json') \ No newline at end of file diff --git a/ada/requirements.txt b/ada/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..549f8fd8df2a1abe9ece9fbfc57571171a93c7b5 --- /dev/null +++ b/ada/requirements.txt @@ -0,0 +1,14 @@ +bcrypt==3.2.0 +certifi==2021.5.30 +cffi==1.14.6 +charset-normalizer==2.0.6 +cryptography==35.0.0 +docopt==0.6.2 +hdfs==2.6.0 +idna==3.2 +paramiko==2.7.2 +pycparser==2.20 +PyNaCl==1.4.0 +requests==2.26.0 +six==1.16.0 +urllib3==1.26.7 \ No newline at end of file diff --git a/ada/setup.py b/ada/setup.py new file mode 100644 index 0000000000000000000000000000000000000000..ffc49d8700a964f6f26ed84aecdfe397194c85bd --- /dev/null +++ b/ada/setup.py @@ -0,0 +1,37 @@ +import setuptools +from ada import VERSION as ada_version + +with open("README.md", "r", encoding="utf-8") as fh: + long_description = fh.read() + + +setuptools.setup( + name="ada", + version=ada_version, + author="shengnan", + author_email="titan.sheng@huawei.com", + description="Ascend debugging assistant", + long_description=long_description, + long_description_content_type="text/markdown", + url="https://codehub-y.huawei.com/s00538840/ada", + project_urls={ + "Bug Tracker" : "https://codehub-y.huawei.com/s00538840/ada/issues", + }, + classifiers=[ + "Programming Language :: Python :: 3", + "License :: OSI Approved :: MIT License", + "Operating System :: OS Independent", + ], + py_modules=['ada_cmd', 'ada_prof_cmd'], + packages=setuptools.find_packages(include=["ada", "ada.*"]), + python_requires=">=3.6", + install_requires=['hdfs>=2.6.0',], + entry_points={ + 'console_scripts':[ + 'ada=ada_cmd:main', + 'ada-pa=ada_prof_cmd:main' + ] + } +) + +# python setup.py sdist bdist_wheel \ No newline at end of file diff --git a/ada/tests/ada_pa_analyzer_unittest.py b/ada/tests/ada_pa_analyzer_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..853cd244c26e17c338ab10ac5ed21feabc680aed --- /dev/null +++ b/ada/tests/ada_pa_analyzer_unittest.py @@ -0,0 +1,67 @@ +import unittest +from ada_pa_base_unittest import * +from ada.pdav2 import ProfilingDataAnalyzer, ProfilingData + + +def find_events_by_type(pd: ProfilingData, event_type: str): + events = [] + for event in pd.event_records: + if event.event == event_type: + events.append(event) + return events + + +class AdaPaAnalyzerUt(AdaPaBaseUt): + def test_analyse_none_in_none(self): + """测试这种场景 + |----Transdata execute----------------------------------------| + |---一些流程---||---None Tiling----------------------|-others--| + |-None Tiling-||-AtomicClean Tiling--| + """ + log_file = get_test_file("sn4.log") + analyzer = ProfilingDataAnalyzer(log_file) + pds = analyzer.read_in_profiling_file() + + self.assertEqual(len(pds), 1) + tiling_events = find_events_by_type(pds[0], "[Tiling]") + self.assertEqual(len(tiling_events), 3) + tiling_events.sort(key=lambda rec: rec.start) + self.assertEqual(tiling_events[0].start, 1643116796438455862) + self.assertEqual(tiling_events[0].end, 1643116796438470737) + self.assertEqual(tiling_events[1].start, 1643116796438456862) + self.assertEqual(tiling_events[1].end, 1643116796438457737) + self.assertEqual(tiling_events[2].start, 1643116796438458862) + self.assertEqual(tiling_events[2].end, 1643116796438459737) + self.assertIsNone(tiling_events[0].node_name) + self.assertIsNone(tiling_events[1].node_name) + self.assertEqual(tiling_events[2].node_name, "[AtomicClean]") + + + def test_analyse_top_in_top(self): + """测试这种场景 + |----Transdata execute---------------------------------------------| + |---一些流程---||---Transdata Tiling----------------------|-others--| + |-Transdata Tiling-||-AtomicClean Tiling--| + """ + log_file = get_test_file("sn5.log") + analyzer = ProfilingDataAnalyzer(log_file) + pds = analyzer.read_in_profiling_file() + + self.assertEqual(len(pds), 1) + tiling_events = find_events_by_type(pds[0], "[Tiling]") + self.assertEqual(len(tiling_events), 3) + tiling_events.sort(key=lambda rec: rec.start) + self.assertEqual(tiling_events[0].start, 1643116796438455862) + self.assertEqual(tiling_events[0].end, 1643116796438470737) + self.assertEqual(tiling_events[1].start, 1643116796438456862) + self.assertEqual(tiling_events[1].end, 1643116796438457737) + self.assertEqual(tiling_events[2].start, 1643116796438458862) + self.assertEqual(tiling_events[2].end, 1643116796438459737) + + self.assertEqual(tiling_events[0].node_name, "[TransData]") + self.assertEqual(tiling_events[1].node_name, "[TransData]") + self.assertEqual(tiling_events[2].node_name, "[AtomicClean]") + + +if __name__ == '__main__': + unittest.main() \ No newline at end of file diff --git a/ada/tests/ada_pa_base_unittest.py b/ada/tests/ada_pa_base_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..ee4900b2bce1f9cdc0884c518c566ee60c57fa0f --- /dev/null +++ b/ada/tests/ada_pa_base_unittest.py @@ -0,0 +1,28 @@ +import unittest +import os +import path_tools +import errno +import json +import csv + + +def get_test_file(name): + log_file = os.path.join(path_tools.get_test_path("ada_pa"), name) + if not os.path.isfile(log_file): + raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), log_file) + return log_file + + +class AdaPaBaseUt(unittest.TestCase): + @classmethod + def setUpClass(cls): + path_tools.clear_test_path("ada_pa") + test_path = path_tools.get_test_path("ada_pa") + data_path = os.path.join(path_tools.get_test_root_path(), "ada_perf_data", "data") + path_tools.copy_all_files_to(data_path, test_path) + + + @classmethod + def tearDownClass(cls): + path_tools.clear_test_path("ada_pa") + pass \ No newline at end of file diff --git a/ada/tests/ada_pa_unittest.py b/ada/tests/ada_pa_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..dc2bfc5e72c36557944cbd0562ed16bcb6130f89 --- /dev/null +++ b/ada/tests/ada_pa_unittest.py @@ -0,0 +1,193 @@ +import unittest +from unittest.mock import patch +import ada_prof_cmd +import sys +from ada_pa_base_unittest import * +from collections import defaultdict + + +def load_tracing_file(name, index): + with open(get_test_file("{}_tracing_{}.json".format(name, index)), 'r') as f: + return json.load(f) + + +def load_summary_file(name, index): + with open(get_test_file("{}_summary_{}.csv".format(name, index)), 'r') as f: + reader = csv.DictReader(f) + return [row for row in reader] + + +def load_op_statistics_file(name, index): + with open(get_test_file("{}_op_stat_{}.csv".format(name, index)), 'r') as f: + reader = csv.DictReader(f) + return [row for row in reader] + + +class TracingChecker(unittest.TestCase): + def __init__(self, tracing): + super().__init__() + self._tracing = tracing + + + def assert_events_num(self, num): + self.assertEqual(len(self._tracing), num) + + + def assert_events_dur(self, event_name, durs): + tracing_durs = [] + for event in self._tracing: + if event["name"] == event_name: + tracing_durs.append(event["dur"]) + self.assertEqual(len(tracing_durs),len(durs)) + for i in range(len(durs)): + self.assertAlmostEqual(tracing_durs[i], durs[i]) + + +class SummaryChecker(unittest.TestCase): + def __init__(self, summary): + super().__init__() + self._summary = {} + for data in summary: + self._summary[data['event']] = data + + + def assert_event_dur(self, event_name, dur): + self.assertTrue(event_name in self._summary) + self.assertAlmostEqual(float(self._summary[event_name]['avg(us)']), dur) + + + def assert_event_count(self, event_name, count): + self.assertTrue(event_name in self._summary) + self.assertAlmostEqual(int(self._summary[event_name]['count']), count) + + + def w_percent_check(self, standard_event = "[AclCompileAndExecute]"): + if not standard_event in self._summary: + return + standard_dur = float(self._summary[standard_event]['total(us)']) + for data in self._summary.values(): + self.assertAlmostEqual(float(data['total(us)']) / standard_dur, float(data['w-percent'])) + + +class OpStatChecker(unittest.TestCase): + def __init__(self, statistics): + super().__init__() + self._stat = defaultdict(list) + for row in statistics: + self._stat[OpStatChecker.get_id(row['name'], row['event'])].append(float(row['duration(us)'])) + + + @staticmethod + def get_id(node_name, event_name): + return "{}@{}".format(node_name, event_name) + + + def assert_durs(self, node_name, event_name, durations): + key = OpStatChecker.get_id(node_name, event_name) + self.assertTrue(key in self._stat) + self.assertEqual(len(self._stat[key]), len(durations)) + for i in range(len(durations)): + self.assertAlmostEqual(self._stat[key][i], durations[i]) + + +class AdaPaUt(AdaPaBaseUt): + def assert_result_file_exists(self, prefix, num): + for i in range(num): + path = get_test_file("{}_summary_{}.csv".format(prefix, i)) + self.assertTrue(os.path.isfile(path)) + + path = get_test_file("{}_tracing_{}.json".format(prefix, i)) + self.assertTrue(os.path.isfile(path)) + + path = get_test_file("{}_op_stat_{}.csv".format(prefix, i)) + self.assertTrue(os.path.isfile(path)) + + + def assert_events_num(self, tracing, num): + self.assertEqual(len(tracing), num) + + + def assert_events_dur(self, tracing, event_name, durs): + tracing_durs = [] + for event in tracing: + if event["name"] == event_name: + tracing_durs.append(event["dur"]) + self.assertEqual(len(tracing_durs), len(durs)) + for i in range(len(durs)): + self.assertAlmostEqual(tracing_durs[i], durs[i]) + + + def test_analyse_single_prof_data(self): + log_file = get_test_file("sn0.log") + with patch.object(sys, 'argv', ['ada-pa', log_file]): + ada_prof_cmd.main() + self.assert_result_file_exists("sn0", 1) + tracing = load_tracing_file("sn0", 0) + tracing_checker = TracingChecker(tracing) + tracing_checker.assert_events_num(12) + tracing_checker.assert_events_dur("[GatherV2]@[ConstPrepare]", [12.36]) + tracing_checker.assert_events_dur("[UpdateShape]", [2.55, 1.67]) + tracing_checker.assert_events_dur("[rtKernelLaunch]", [18.53, 21.19]) + + summary = load_summary_file("sn0", 0) + summary_checker = SummaryChecker(summary) + summary_checker.assert_event_dur("[ConstPrepare]", 10.475) + summary_checker.assert_event_dur("[UpdateShape]", 2.11) + summary_checker.assert_event_dur("[rtKernelLaunch]", 19.86) + summary_checker.assert_event_count("[ConstPrepare]", 2) + summary_checker.assert_event_count("[UpdateShape]", 2) + summary_checker.assert_event_count("[rtKernelLaunch]", 2) + summary_checker.w_percent_check("[OpExecute]") + + stat = load_op_statistics_file("sn0", 0) + stat_checker = OpStatChecker(stat) + stat_checker.assert_durs("[GatherV2]", "[ConstPrepare]", [12.36]) + stat_checker.assert_durs("[GatherV2]", "[UpdateShape]", [2.55]) + stat_checker.assert_durs("[GatherV2]", "[Tiling]", [20.32]) + stat_checker.assert_durs("[Mul]", "[Tiling]", [20.12]) + + + def test_analyse_single_prof_data_aclopcompileandexecute(self): + log_file = get_test_file("sn4.log") + with patch.object(sys, 'argv', ['ada-pa' , log_file]): + ada_prof_cmd.main() + self.assert_result_file_exists("sn4", 1) + + summary = load_summary_file("sn4", 0) + summary_checker = SummaryChecker(summary) + summary_checker.w_percent_check() + + + def test_analyse_single_prof_for_op_stat(self): + log_file = get_test_file("sn3.log") + with patch.object(sys, 'argv', ['ada-pa' , log_file]): + ada_prof_cmd.main() + self.assert_result_file_exists("sn3", 1) + + stat = load_op_statistics_file("sn3", 0) + stat_checker = OpStatChecker(stat) + stat_checker.assert_durs("[Transpose]", "[AclMatchStaticOpModel]", [1.594]) + stat_checker.assert_durs("[trans_TransData_85]", "[InferShape]", [12.891]) + stat_checker.assert_durs("[Identity]", "[Tiling]", [18.289]) + + + def test_analyse_single_prof_data_large(self): + log_file = get_test_file("sn2.log") + with patch.object(sys, 'argv', ['ada-pa' , log_file]): + ada_prof_cmd.main() + self.assert_result_file_exists("sn2", 1) + tracing = load_tracing_file("sn2", 0) + self.assert_events_num(tracing, 24862) + self.assert_events_dur(tracing, "[GatherV2]@[AclCompileAndExecute]", [114.09, 69.826, 43.023, 40.215]) + self.assert_events_dur(tracing, "[MatMul]@[Tiling]", [35.266, 24.663, 22.501]) + + + def test_analyse_multiple_prof_data(self): + log_file = get_test_file("sn1.log") + with patch.object(sys, 'argv', ['ada-pa' , log_file]): + ada_prof_cmd.main() + self.assert_result_file_exists("sn1", 2) + + +if __name__ == '__main__': + unittest.main() diff --git a/ada/tests/ada_perf_data/reporter/xurui_reporter.py b/ada/tests/ada_perf_data/reporter/xurui_reporter.py new file mode 100644 index 0000000000000000000000000000000000000000..bc8e8181056a191d9c0069c40ed06a1a679db7cc --- /dev/null +++ b/ada/tests/ada_perf_data/reporter/xurui_reporter.py @@ -0,0 +1,191 @@ +from ada.pdav2 import Record, ProfilingData, get_name +from collections import OrderedDict +from typing import List +import csv + + +STANDARD_EVENT = "[OpExecute]" + + +class SERecord: + """Start-End Record""" + def __init__(self): + self.node_name: str = None + self.event: str = None + self.et: str = None + self.tid: int = None + self.start: int = None + self.end: int = None + self.iteration: str = None + + + @staticmethod + def from_record(start: Record, end: Record): + se_rec = SERecord() + for attr in dir(end): + if callable(getattr(end, attr)) or attr.startswith("__"): + continue + if attr == "et": + continue + if attr == "timestamp": + se_rec.end = getattr(end, attr) + else: + setattr(se_rec, attr, getattr(end, attr)) + se_rec.start = start.timestamp + se_rec.et = "range" + return se_rec + + + def __str__(self): + return "{} {} {} -> {}".format(self.node_name, self.event, self.start, self.end) + + +class SEProfilingData: + def __init__(self): + self.records: List[SERecord] = [] + + + @staticmethod + def from_profiling_data(pds: List[ProfilingData]): + results = [] + for pd in pds: + se_pd = SEProfilingData() + results.append(se_pd) + events_to_start_rec = {} + for rec in pd.records: + name = get_name(rec) + if rec.et == 'Start': + events_to_start_rec[name] = rec + else: + try: + start_rec = events_to_start_rec.pop(name) + except KeyError as _: + continue + se_pd.records.append(SERecord.from_record(start_rec, rec)) + return results + + +class Ranges: + def __init__(self): + self._ranges_to_names = OrderedDict() + + + def find_ranges(self, num): + results = [] + for (start, end), names in self._ranges_to_names.items(): + if start > num: + break + if end > num: + results += names + return results + + + def add_range(self, name, start, end): + pair = (start, end) + if pair in self._ranges_to_names: + self._ranges_to_names[pair].append(name) + else: + self._ranges_to_names[pair] = [name, ] + + +class XrReporter: + def __init__(self, pds: [ProfilingData]): + self._pds = SEProfilingData.from_profiling_data(pds) + + + @staticmethod + def analyse_records(pd: SEProfilingData): + pds_by_execution = [] + for rec in pd.records: + if rec.event == STANDARD_EVENT: + ranges = Ranges() + ranges.add_range("_", rec.start, rec.end) + pds_by_execution.append((ranges, [rec, ])) + for rec in pd.records: + if rec.event != STANDARD_EVENT: + for pd_by_exe in pds_by_execution: + if len(pd_by_exe[0].find_ranges(rec.start)) > 0: + pd_by_exe[1].append(rec) + break + else: + print("WARNING: record {} does not fit in any execution".format(rec)) + return pds_by_execution + + + @staticmethod + def is_op_execution(records : List[SERecord]): + tids = set() + for rec in records: + tids.add(rec.tid) + # 单算子执行只有一个线程,因此判断tid就可以了,当前也有其他很多判断方法 + return len(tids) == 1 + + + @staticmethod + def read_in_events(pds): + events = [] + for pd in pds: + events_to_time_len = {} + for rec in pd.records: + name = rec.event + if name in events_to_time_len: + events_to_time_len[name].append((rec.end - rec.start) / 1000) + else: + events_to_time_len[name] = [(rec.end - rec.start) / 1000, ] + + events.append(events_to_time_len) + return events + + + @staticmethod + def report_to_file(file_path, pds, exe_type): + events = XrReporter.read_in_events(pds) + for i, events_to_time_len in enumerate(events): + summary = [] + total_time_len = sum(events_to_time_len.get(STANDARD_EVENT, [])) + if total_time_len == 0: + print("WARNNING: Missing standard event, Can not generate the weighted percent") + for event, time_lens in events_to_time_len.items(): + j = { + "event": event, + "count": len(time_lens), + "avg(us)": sum(time_lens) / len(time_lens), + "total(us)": sum(time_lens), + "w-percent": 0 + } + if total_time_len != 0: + j["w-percent"] = sum(time_lens) / total_time_len + summary.append(j) + + dump_path = "{}_summary_{}_{}{}".format(file_path, exe_type, i, ".csv") + print("Dump to summary file {}".format(dump_path)) + with open(dump_path, "w", newline='') as f: + cf = csv.DictWriter(f, fieldnames=["event", "count", "avg(us)", "total(us)", "w-percent"]) + cf.writeheader() + cf.writerows(summary) + + + def report(self, file_path): + op_exe_pds = [] + graph_exe_pds = [] + for pd in self._pds: + execution_ranges_to_records = XrReporter.analyse_records(pd) + op_exe_pd = SEProfilingData() + op_exe_pds.append(op_exe_pd) + graph_exe_pd = SEProfilingData() + graph_exe_pds.append(graph_exe_pd) + + for ranges, records in execution_ranges_to_records: + if XrReporter.is_op_execution(records): + op_exe_pd.records += records + else: + graph_exe_pd.records += records + tmp = XrReporter.read_in_events(op_exe_pds) + XrReporter.report_to_file(file_path, tmp, "op") + tmp = XrReporter.read_in_events(graph_exe_pds) + XrReporter.report_to_file(file_path, tmp, "graph") + + +def report(pds: [ProfilingData], file_path): + reporter = XrReporter(pds) + reporter.report(file_path) \ No newline at end of file diff --git a/ada/tests/main_unittest.py b/ada/tests/main_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..5616a4ee6ac859bb4270d3928ee89cecbac3f091 --- /dev/null +++ b/ada/tests/main_unittest.py @@ -0,0 +1,63 @@ +import unittest +from unittest.mock import MagicMock +import ada_cmd +from ada import cip +from ada import local_machine + + +class TestMain(unittest.TestCase): + def test_parse_args_default(self): + args = ada_cmd.parse_args(['-i']) + self.assertIsNone(args.newest) + self.assertIsNone(args.compile) + self.assertTrue(args.install) + self.assertTrue(args.wait) + self.assertIsNone(args.directory) + + + def test_parse_args_dir(self): + args = ada_cmd.parse_args(['-i', '-d', "./"]) + self.assertTrue(args.install) + self.assertEqual(args.directory, './') + + + def test_parse_args_download_and_install(self): + args = ada_cmd.parse_args(['-i', '-n', 'atc,acllib', '-d', "./"]) + self.assertIsNone(args.compile) + self.assertTrue(args.install) + self.assertEqual(args.directory, './') + self.assertEqual(args.newest, 'atc,acllib') + + + def test_parse_args_download_compile_and_install(self): + args = ada_cmd.parse_args(['-i', '-c', 'atc,acllib', '-d', "./"]) + self.assertIsNone(args.newest) + self.assertTrue(args.install) + self.assertEqual(args.directory, './') + self.assertEqual(args.compile, 'atc,acllib') + + + def test_download_newest(self): + cip.get_env = MagicMock(return_value=(cip.OsType.UBUNTU, "x86_64")) + args = ada_cmd.parse_args(['-n', 'atc,acllib', '-d', 'tada/packages']) + names = ada_cmd.download(args) + + def test_download_compile(self): + cip.get_env = MagicMock(return_value=(cip.OsType.EULEROS, "x86_64")) + args = ada_cmd.parse_args(['-c', '20211021_213524994_I2466ff9,fwkacllib,acllib', '-d', 'tada/packages']) + names = ada_cmd.download(args) + self.assertEqual(set([name.split('-')[1] for name in names]), {'fwkacllib', 'acllib'}) + + + def test_download_compile_and_newest(self): + cip.get_env = MagicMock(return_value=(cip.OsType.UBUNTU, "x86_64")) + args = ada_cmd.parse_args(['-c', '20210930_094741849_If6be636,atc,acllib', '-n', 'toolkit,opp,fwkacllib,tfplugin', '-d', 'tada/packages']) + names = ada_cmd.download(args) + self.assertEqual(set([name.split('-')[1] for name in names]), {'atc', 'acllib', 'toolkit', 'opp', 'fwkacllib', 'tfplugin'}) + + + def test_download_and_install_newest(self): + cip.get_env = MagicMock(return_value=(cip.OsType.UBUNTU, "x86_64")) + args = ada_cmd.parse_args(['-n', 'atc,acllib', '-d', 'tada/packages']) + names = ada_cmd.download(args) + ada_cmd.install("tada/packages", names) diff --git a/ada/tests/path_tools.py b/ada/tests/path_tools.py new file mode 100644 index 0000000000000000000000000000000000000000..6e22d532989d1aa1a7eec16c12c0d45231d4b985 --- /dev/null +++ b/ada/tests/path_tools.py @@ -0,0 +1,34 @@ +import os +import shutil + + +def get_test_root_path(): + self_path = os.path.abspath(__file__) + return os.path.dirname(self_path) + + +def get_temp_path(): + return os.path.join(get_test_root_path(), 'temp-output') + + +def get_test_path(name): + return os.path.join(get_temp_path(), name) + + +def create_test_path(name): + test_path = get_test_path(name) + if not os.path.isdir(test_path): + os.mkdir(test_path) + return test_path + + +def clear_test_path(name): + test_path = get_test_path(name) + if os.path.isdir(test_path): + shutil.rmtree(test_path) + + +def copy_all_files_to(from_path, to_path): + if os.path.isdir(to_path): + shutil.rmtree(to_path) + shutil.copytree(from_path, to_path) \ No newline at end of file diff --git a/ada/tests/tada/cip_unittest.py b/ada/tests/tada/cip_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..4de1b720ff0d73616bf760d2a603464cfeb074ae --- /dev/null +++ b/ada/tests/tada/cip_unittest.py @@ -0,0 +1,41 @@ +import logging +import unittest +from unittest.mock import MagicMock +from ada import cip + + +class TestCip(unittest.TestCase): + def setUp(self) -> None: + super().setUp() + logging.basicConfig(level=logging.INFO) + logging.getLogger('hdfs.client').setLevel(logging.WARNING) + + + def test_download_compile(self): + cip.get_env = MagicMock(return_value = (cip.OsType.UBUNTU, "x86_64")) + c = cip.HisiHdfs() + c.download_compile_packages("20210930_094741849_If6be636", "./", [cip.PackageType.ATC_ONETRACK, cip.PackageType.ACLLIB_ONETRACK]) + + + def test_download_newest0(self): + cip.get_env = MagicMock(return_value = (cip.OsType.UBUNTU, "x86_64")) + c = cip.HisiHdfs() + ret = c.download_newest("./packages", [cip.PackageType.ACLLIB_ONETRACK, cip.PackageType.COMPILER_CANN]) + self.assertTrue(ret[0].startswith("Ascend-acllib-")) + self.assertTrue(ret[0].find('ubuntu') > 0) + self.assertTrue(ret[0].find('x86_64') > 0) + self.assertTrue(ret[1].startswith("CANN-compiler-")) + self.assertTrue(ret[1].find('ubuntu') > 0) + self.assertTrue(ret[1].find('x86_64') > 0) + + + def test_download_newest1(self): + cip.get_env = MagicMock(return_value = (cip.OsType.LINUX, "x86_64")) + c = cip.HisiHdfs() + ret = c.download_newest("./packages", [cip.PackageType.RUNTIME_CANN, cip.PackageType.COMPILER_CANN]) + self.assertTrue(ret[0].startswith("CANN-runtime-")) + self.assertTrue(ret[0].find('linux') > 0) + self.assertTrue(ret[0].find('x86_64') > 0) + self.assertTrue(ret[1].startswith("CANN-compiler-")) + self.assertTrue(ret[1].find('linux') > 0) + self.assertTrue(ret[1].find('x86_64') > 0) \ No newline at end of file diff --git a/ada/tests/tada/eo_unittest.py b/ada/tests/tada/eo_unittest.py new file mode 100644 index 0000000000000000000000000000000000000000..8f8d4f283cd252a8433fe6f02cccc05b1e11e924 --- /dev/null +++ b/ada/tests/tada/eo_unittest.py @@ -0,0 +1,28 @@ +import unittest +import logging +from ada import eo + + +class TestEo(unittest.TestCase): + def setUp(self) -> None: + logging.basicConfig(level=logging.INFO) + self.handle = eo.AscendShell.create_ssh("10.138.254.157", "root", "root") + + + def test_file_exists(self): + self.assertTrue(self.handle.file_exists("/home/shengnan/packages/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run")) + self.assertFalse(self.handle.file_exists("/home/shengnan/packages/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run1")) + self.assertTrue(self.handle.file_exists("packages/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run")) + self.assertFalse(self.handle.file_exists("packages/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run1")) + + + def test_install(self): + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run") + + + def test_install_multiple(self): + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-acllib-1.79.t30.0.b300-ubuntu18.04.x86_64.run") + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-atc-1.79.t30.0.b300-ubuntu18.04.x86_64.run") + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-fwkacllib-1.79.t30.0.b300-ubuntu18.04.x86_64.run") + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-opp-1.79.t30.0.b300-ubuntu18.04.x86_64.run") + self.handle.install("/home/shengnan/packages/0929_newest/Ascend-toolkit-1.79.t30.0.b300-ubuntu18.04.x86_64.run") \ No newline at end of file