name | about | labels |
---|---|---|
RFC | New enhancement: Add a checking mechanism for the need of Renormalize pass in Parse pipeline | kind/enhancement |
During the Parse pipeline, many functional/optimization passes are run. They are grouped into so called "group passes
". For example, there are 4 groups named as: opt_a
, opt_b
, opt_control
and opt_prepare
(please see pass.cc
). Each group contains sub-groups, sub-groups contain sub-passes, etc. Each sub-pass check the actual graph parsed from the input code and may modify the graph. In order to ensure that the graph is fully inferred (with all the nodes have a correct deduced type), the Renormalize pass is called multiple times during the parsing process (minimum of 4 times, test_bert_train calls 6 times as seen in the following picture)
We have observed that depending on the actual input graph, many passes don't make actual changes to the graph, others do modify the graph but the newly generated nodes have the type inferred/inherited from the original nodes. In those cases the Renormalize pass doesn't need to be called again to save processing time.
The idea of this solution is to add a mechanism of collecting all newly created (or modified) nodes from the sub-passes and when a Renormalize pass is called, the pass will be skipped if the collection is empty. And because of the organization of group passes in Parse pipeline, we consider to pilot this mechanism at the level of group passes.
Further investigation shows that:
opt-b
and opt-control
(at this moment) consist of type-preserving sub-passes, i.e. if the input graph was fully inferred then the newly created node would have correct type inferred too. So we will start to test the design with those two groups of passes.opt_a
is not included in the test yet because opt_a
is called 1st, when the input graph is not yet inferred so the 1st full Renormalize is always needed. And opt_a
also contains the grad
sub-pass, which may totally change the graph so the whole new graph will need to be inferred.The implementation was tested with the network testing that are available in mindspore repo at this moment. All the tests were passed correctly. We shows below the percentage of time that opt_b and opt_c takes in the Parsing pipeline as the measure of the comparison. These tests are:
For BERT: Originally the opt_b.renormalize takes about 2.95% of the total (parsing) time, opt_control.renormalize takes 1.42%, which contributes a total of 4.37%. With the changes in this RRC, the total time drop significantly to almost 0%.
For RESNET train test: Originally the total time for opt_b and opt_control passes uses 5,36% of total time, with this RFC, the total time drops to 1,98%.
For RESNET infer test: Originally the time for opt_b (no opt_control in this case) was 0,66%. With this RFC, the time drops to almost 0%.
For LeNet network: Originally the time for opt_b and opt_control passes was 3,44%. With this RFC, the time again drops to almost 0%.
Notes for further development:
opt_a
group of passes is worth of consideration. The organization of this group is now as shown below:No. | Task Description | Related Issue(URL) |
---|---|---|
1 | ||
2 |
Hey @thlinh, Welcome to MindSpore Community.
All of the projects in MindSpore Community are maintained by @mindspore-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/mindspore/community/blob/master/command.md to find the details.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
The related PR from Linh: !121:Add a checking mechanism for the need of Renormalize pass in Parse pipeline
Hi, Linh. Thank you for your RFC & PR. It is a good step to do type-preserved opt to reduce the cost of renorm.
I have two question.
Question 1: Can renorm of opt_a also be checked? As in the last iteration of opt_a, the renorm may be reduntant.
Quetions 2: I think there maybe unsufficient that whether there is a case the new substituted result node is typed while some of its inputs are constructed with untyped? If there is no such case in our current opt_b passes currently, we need to add this as a new rule to all passes. If there are some untyped nodes in substituted nodes, the root of these nodes must be untyped.
There may be three choices we can choose to make sure all the sub-passes are "shallow".
Choice 1 is obey the rule manually. Every new subtree to replace has to obey the rule "if ever a subnode is untyped then the root must be untyped".
Choice 2 is to use "local type inference". Because before substitution every happens, type information is complete and thus all the types of the parameter are there; so every subnode in the subtree that is parameter can be easily typed -- in that way an easy bottom up, linear time inference is possible. But, if there is a global level renorm behind the current pass, the local level type infer is reduntant.
Choice 3 is to refact our IR interface to have a single "standardized" method of graph modifying in our infrastructure, like LLVM replaceAllUsesWith. So that, we can make sure all the new nodes are typed or untyped.
I think we can combine Choice 1 and Choice 3 to add a check when we relace the nodes. By using the nodes() or all_nodes() from graph manager, we can travel all the new nodes from the root node easyly.
The related PR from Linh: !121:Add a checking mechanism for the need of Renormalize pass in Parse pipeline
Hi, Linh. Thank you for your RFC & PR. It is a good step to do type-preserved opt to reduce the cost of renorm.
I have two question.
Question 1: Can renorm of opt_a also be checked? As in the last iteration of opt_a, the renorm may be reduntant.
Quetions 2: I think there maybe unsufficient that whether there is a case the new substituted result node is typed while some of its inputs are constructed with untyped? If there is no such case in our current opt_b passes currently, we need to add this as a new rule to all passes. If there are some untyped nodes in substituted nodes, the root of these nodes must be untyped.
There may be three choices we can choose to make sure all the sub-passes are "shallow".
Choice 1 is obey the rule manually. Every new subtree to replace has to obey the rule "if ever a subnode is untyped then the root must be untyped".
Choice 2 is to use "local type inference". Because before substitution every happens, type information is complete and thus all the types of the parameter are there; so every subnode in the subtree that is parameter can be easily typed -- in that way an easy bottom up, linear time inference is possible. But, if there is a global level renorm behind the current pass, the local level type infer is reduntant.
Choice 3 is to refact our IR interface to have a single "standardized" method of graph modifying in our infrastructure, like LLVM replaceAllUsesWith. So that, we can make sure all the new nodes are typed or untyped.
I think we can combine Choice 1 and Choice 3 to add a check when we relace the nodes. By using the nodes() or all_nodes() from graph manager, we can travel all the new nodes from the root node easyly.
@kungsen : Thank you very much for your sharing. Yes, we can have different possible solutions here. For long term purposes, in my opinion the Choice 3 is the best. I have just 2 concerns. One is once the new "standardized" method is implemented, will all the passes be re-written or they can stay as-is? The other is would the new "standardized" method need the "local type inference"? I think it would still need the "local type inference" since it's would be much cleaner if each pass takes care of the newly generated nodes, because at the graph level, for some cases the total number of nodes is very high (Bert may have ~30k nodes) so that a whole tree walk would be costly.
The related PR from Linh: !121:Add a checking mechanism for the need of Renormalize pass in Parse pipeline
Hi, Linh. Thank you for your RFC & PR. It is a good step to do type-preserved opt to reduce the cost of renorm.
I have two question.
Question 1: Can renorm of opt_a also be checked? As in the last iteration of opt_a, the renorm may be reduntant.Quetions 2: I think there maybe unsufficient that whether there is a case the new substituted result node is typed while some of its inputs are constructed with untyped? If there is no such case in our current opt_b passes currently, we need to add this as a new rule to all passes. If there are some untyped nodes in substituted nodes, the root of these nodes must be untyped.
There may be three choices we can choose to make sure all the sub-passes are "shallow".
Choice 1 is obey the rule manually. Every new subtree to replace has to obey the rule "if ever a subnode is untyped then the root must be untyped".
Choice 2 is to use "local type inference". Because before substitution every happens, type information is complete and thus all the types of the parameter are there; so every subnode in the subtree that is parameter can be easily typed -- in that way an easy bottom up, linear time inference is possible. But, if there is a global level renorm behind the current pass, the local level type infer is reduntant.
Choice 3 is to refact our IR interface to have a single "standardized" method of graph modifying in our infrastructure, like LLVM replaceAllUsesWith. So that, we can make sure all the new nodes are typed or untyped.
I think we can combine Choice 1 and Choice 3 to add a check when we relace the nodes. By using the nodes() or all_nodes() from graph manager, we can travel all the new nodes from the root node easyly.
@kungsen I actually don't quite understand your concerns on "redundant computation" in Choice 2 because in my opinion, Choice 1 and Choice 2 should be the same linear complexity (traversal of the subtree). The only reason I think Choice 1 should be preferred is its simplicity in implementation; but Choice 2 makes more sense because its logic is more direct compared to Choice 1. (we are facing problems of untyped nodes).
I think the "redundancy" only appears in the sense of code duplication.
@kungsen , @thlinh and @kaitingwang talked about the idea to amortize your Choice 1 by introducing a template class. We have come up with a RFC #I1DXI0:"Typed"-Tag Indicating well-typedness of a whole subtree for Type Preserving to be helpful for this patch. Please take a look. Thanks :)
@kungsen : Related to your question 1: Can renorm of opt_a also be checked? As in the last iteration of opt_a, the renorm may be reduntant. The answer is Yes, it's possible. I'll extend later a solution for signalling between the sub-passes of a group. In the 1st version of this RFC, we check for the whole group of sub-passes only.
More on your question 2: I think we can also add a property flag for each sub-pass to define if the given pass is "shallow" or not, it means that it will tell us that for a pass we just need to check the "result"
node or we need to check the whole tree rooted at "result"
node. In case that we need to check the whole tree rooted at "result"
, please see more at https://gitee.com/mindspore/dashboard?issue_id=I1DXI0 the idea of @Ende.Jin and @kaitingwang that we can save checking time if knowing which subtree is already fully typed.
@kungsen : Thank you very much for your sharing. Yes, we can have different possible solutions here. For long term purposes, in my opinion the Choice 3 is the best. I have just 2 concerns. One is once the new "standardized" method is implemented, will all the passes be re-written or they can stay as-is? The other is would the new "standardized" method need the "local type inference"? I think it would still need the "local type inference" since it's would be much cleaner if each pass takes care of the newly generated nodes, because at the graph level, for some cases the total number of nodes is very high (Bert may have ~30k nodes) so that a whole tree walk would be costly.
@thlinh I think small adaption of "standardized" method for all pass is acceptable. Before the "standardized" method is ready, we can obey the rule manully to get the gain, as "if ever a subnode is untyped then the root must be untyped" is easy to obey manually. The "local type inference" is still needed.
@kungsen I actually don't quite understand your concerns on "redundant computation" in Choice 2 because in my opinion, Choice 1 and Choice 2 should be the same linear complexity (traversal of the subtree). The only reason I think Choice 1 should be preferred is its simplicity in implementation; but Choice 2 makes more sense because its logic is more direct compared to Choice 1. (we are facing problems of untyped nodes).
I think the "redundancy" only appears in the sense of code duplication.
@Ende.Jin The "redundancy" means that if there is a global level renorm later doing "local type infer" in every pass is reduntant. If we can collect all untyped nodes, and only "local type infer" need to be done would be prefered.
@thlinh As you modified the checking action, could you please update the results of the different networks mentioned in this issue? And I prefer the total time should be added to the results. You can get it easily in the log.
@biffex : I have re-run the tests with MS master repo (before PR) and with changes in PR. Here are the latest results (run on my blue laptop). In general the checking takes some amount of time, but the total performance is still improved about 70% for the renormalize passes.
@kungsen : There is cases (for example in test_graph_return_const_param.py
), where the "return"
was single node of type AbstractTensor (no other nodes were generated) but still need a full Renormalize to be well-typed. And there are obviously (as can be seen in validator.cc) types that are not supported later (at back-end), which will cause the validating agent to raise Exception there. So for those types, I still set to call the Renormalize to: 1. have the node further processed, 2. there may be a chance that a part of Renormalize (like Specialize) can deal with those nodes with unsupport types so that it would pass the validating later. By adding more cases for "2", we sacrifice a little in the performance with the hope that we can support better more cases, but it will be safer for the parser.
@kungsen : There is cases (for example in
test_graph_return_const_param.py
), where the"return"
was single node of type AbstractTensor (no other nodes were generated) but still need a full Renormalize to be well-typed. And there are obviously (as can be seen in validator.cc) types that are not supported later (at back-end), which will cause the validating agent to raise Exception there. So for those types, I still set to call the Renormalize to: 1. have the node further processed, 2. there may be a chance that a part of Renormalize (like Specialize) can deal with those nodes with unsupport types so that it would pass the validating later. By adding more cases for "2", we sacrifice a little in the performance with the hope that we can support better more cases, but it will be safer for the parser.
@thlinh I see the changes in your PR. Thank you for your deep analysis. In my opinion, we prefer a simple rule to guide the need of renorm instead of a complex condition tested by CI. Otherwise, if others add a new pass, they have no idea whether the new node will be renorm. I have two suggestion for this PR before partial renorm is ready.
Suggestion1: Find a simple rule to follow, we can change some pass implementation to follow this rule.
Sugeestion2: Manually add need further renorm node in each pass, as the developer of each pass should known whether the replaced node needed renorm.
@thlinh I see the changes in your PR. Thank you for your deep analysis. In my opinion, we prefer a simple rule to guide the need of renorm instead of a complex condition tested by CI. Otherwise, if others add a new pass, they have no idea whether the new node will be renorm. I have two suggestion for this PR before partial renorm is ready.
Suggestion1: Find a simple rule to follow, we can change some pass implementation to follow this rule.
Sugeestion2: Manually add need further renorm node in each pass, as the developer of each pass should known whether the replaced node needed renorm.
@kungsen : Thank you very much for your suggestions. Since the main idea of this RFC is at level of group of sub-passes so I'll go with your Suggestion 2 first. I will add two optional flags for the Substitution: "is_renorm_in_watch" and "is_validate_abstract". The first flag will force to call Renormalize if that pass return a new nodes (if the pass was empty then Renormalize will depend on other passes). The second flag will force to check if the nodes have acceptable types (otherwise it will be later rejected too by validator). I set these flags for the two passes "make_ref_eliminate" and "replace_refkey_by_param" as example. The tests went good for me.
@gongchen-0812
whether the issue can be closed?
登录 后才可以发表评论