In deep learning, as the size of the dataset and number of parameters grows, the time and hardware resources required for training will increase and eventually become a bottleneck that constrains training. Distributed parallel training, which can reduce the demand on hardware such as memory and computational performance, is an important optimization means to perform training. According to the different principles and modes of parallelism, the types of mainstream parallelism are as follows:
MindSpore currently offers the following four parallel modes:
DATA_PARALLEL
: Data parallel mode.
AUTO_PARALLEL
: Automatic parallel mode, a distributed parallel mode that incorporates data parallelism and operator model parallelism, can automatically build cost models, find parallel strategies with shorter training times, and select the appropriate parallel mode for the user. MindSpore currently supports automatic search for operator parallel strategy, and provides three different strategy search algorithms as follows:
dynamic_programming
: Dynamic programming strategy search algorithm. It is able to search for the optimal strategy inscribed by the cost model, but it is time consuming in searching for parallel strategies for huge network models. Its cost model is modeled based on the memory-based computational overhead and communication overhead of the Ascend 910 chip for training time.recursive_programming
: Double recursive strategy search algorithm. The optimal strategy is generated instantaneously for huge networks and large-scale multi-card slicing. Its cost model based on symbolic operations can be freely adapted to different accelerator clusters.sharding_propagation
: Sharding strategy propagation algorithm。A parallel strategy is propagated from operators configured with parallel policies to operators not configured. When propagating, the algorithm tries to select the strategy that triggers the least amount of tensor redistribution communication. For the parallel strategy configuration and tensor redistribution of the operator, refer to this design document.SEMI_AUTO_PARALLEL
: Semi-automatic parallelism mode, compared to automatic parallelism, requires the user to manually configure the shard strategy for the operator to achieve parallelism.
HYBRID_PARALLEL
: In MindSpore it specifically refers to scenarios where the user achieves hybrid parallelism by manually slicing the model.
MindSpore provides you with a series of easy-to-use parallel training components. To get a better understanding of MindSpore distributed parallel training components, we recommend that you read this tutorial in the following order.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。