diff --git a/docs/mindspore/source_en/faq/index.rst b/docs/mindspore/source_en/faq/index.rst index 28cebf75fdf7ce3972e1d9ce88c564e7737a0dd8..b82220f47ad56dac95a88a4a5d0724a35080cc8d 100644 --- a/docs/mindspore/source_en/faq/index.rst +++ b/docs/mindspore/source_en/faq/index.rst @@ -10,6 +10,7 @@ FAQ implement_problem network_compilation operators_compile + tools performance_tuning precision_tuning distributed_parallel diff --git a/docs/mindspore/source_en/faq/tools.md b/docs/mindspore/source_en/faq/tools.md new file mode 100644 index 0000000000000000000000000000000000000000..0178e0f262e3752a22f3b1aacf891ad84e93fa48 --- /dev/null +++ b/docs/mindspore/source_en/faq/tools.md @@ -0,0 +1,21 @@ +# Tools + +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/faq/tools.md) + +## Q: When using the overflow detection Dump feature, I encounter the error `RuntimeError: aclnnAllFiniteGetWorkspaceSize call failed, please check!`. How can I resolve this? + +A: This error typically occurs because the custom operators used by the overflow detection feature are incompatible with the current CANN version. The overflow detection Dump functionality in MindSpore has strict version requirements for CANN. A higher-version MindSpore is not compatible with a lower-version CANN. + +To resolve this issue, consider the following approaches: + +- Upgrade your CANN version + + Please refer to the official [Version Compatibility Matrix](https://www.mindspore.cn/install/en/) to install a CANN version that matches or exceeds the requirement for your MindSpore version. + +- Switch to statistic-based Dump or other Dump methods + + If upgrading CANN is not immediately feasible, you can disable overflow detection Dump and instead use: + - Statistic Dump: Records tensor statistics such as maximum and minimum values to help identify potential overflow; + - Full Dump or Selective Dump: Saves intermediate tensor data for offline analysis of numerical anomalies, detailed please refer to [Using Dump in the Graph Mode](https://www.mindspore.cn/tutorials/en/master/debug/dump.html). + +These workarounds can effectively avoid the runtime error while still enabling you to investigate overflow issues. diff --git a/docs/mindspore/source_zh_cn/faq/index.rst b/docs/mindspore/source_zh_cn/faq/index.rst index 28cebf75fdf7ce3972e1d9ce88c564e7737a0dd8..b82220f47ad56dac95a88a4a5d0724a35080cc8d 100644 --- a/docs/mindspore/source_zh_cn/faq/index.rst +++ b/docs/mindspore/source_zh_cn/faq/index.rst @@ -10,6 +10,7 @@ FAQ implement_problem network_compilation operators_compile + tools performance_tuning precision_tuning distributed_parallel diff --git a/docs/mindspore/source_zh_cn/faq/tools.md b/docs/mindspore/source_zh_cn/faq/tools.md new file mode 100644 index 0000000000000000000000000000000000000000..96efb7d26044f3e6545fb3b9ee3dead63d2d07ff --- /dev/null +++ b/docs/mindspore/source_zh_cn/faq/tools.md @@ -0,0 +1,20 @@ +# 工具 + +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/faq/tools.md) + +## Q: 使用溢出检测Dump功能时,遇到`RuntimeError: aclnnAllFiniteGetWorkspaceSize call failed, please check!`报错,该如何解决? + +A: 该错误通常是因为溢出检测功能所依赖的自定义算子与当前 CANN 版本不兼容所致。MindSpore 的溢出检测 Dump 功能对 CANN 版本有严格要求,高版本 MindSpore 无法兼容低版本 CANN。 + +建议通过以下方式解决: + +- 升级CANN版本 + + 请根据您使用的 MindSpore 版本,参考官方文档中的 [版本配套关系表](https://www.mindspore.cn/install),安装匹配或更高版本的 CANN。 +- 改用统计量Dump或其他Dump方式 + + 若暂时无法升级 CANN ,可关闭溢出检测Dump,转而使用: + - 统计量 Dump:记录张量的最大值、最小值等信息,用于判断是否发生溢出; + - 全量 Dump 或选择性 Dump:保存中间张量数据,辅助离线分析数值异常,详细请参考[Dump功能调试](https://www.mindspore.cn/tutorials/zh-CN/master/debug/dump.html)。 + +通过上述方法,可有效规避该运行时错误,并继续完成溢出问题的排查。