335 Star 1.5K Fork 862

MindSpore / docs

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
startup_method.rst 3.10 KB
一键复制 编辑 原始数据 按行查看 历史
ZPaC 提交于 2024-02-05 20:49 . Add faqs and other hints.

Distributed Parallel Startup Methods

View Source on Gitee
.. toctree::
  :maxdepth: 1
  :hidden:

  msrun_launcher
  dynamic_cluster
  mpirun
  rank_table

Startup Method

Currently GPU, Ascend and CPU support multiple startup methods respectively, four of which are msrun, dynamic cluster, mpirun and rank table:

  • msrun: msrun is the capsulation of Dynamic cluster. It allows user to launch distributed jobs using one single command in each node. It could be used after MindSpore is installed. This method does not rely on third-party libraries and configuration files, has disaster recovery function, good security, and supports three hardware platforms. It is recommended that users prioritize the use of this startup method.
  • Dynamic cluster: dynamic cluster requires user to spawn multiple processes and export environment variables. It's the implementation of msrun. Use this method when running Parameter Server training mode. For other distributed jobs, msrun is recommended.
  • mpirun: this method relies on the open source library OpenMPI, and startup command is simple. Multi-machine need to ensure two-by-two password-free login. It is recommended for users who have experience in using OpenMPI to use this startup method.
  • rank table: this method requires the Ascend hardware platform and does not rely on third-party library. After manually configuring the rank_table file, you can start the parallel program via a script, and the script is consistent across multiple machines for easy batch deployment.

Warning

rank_table method will be deprecated in MindSpore 2.4 version.

The hardware support for the four startup methods is shown in the table below:

  GPU Ascend CPU
msrun Support Support Support
Dynamic cluster Support Support Support
mpirun Support Support Not support
rank table Not support Support Not support
1
https://gitee.com/mindspore/docs.git
git@gitee.com:mindspore/docs.git
mindspore
docs
docs
master

搜索帮助

53164aa7 5694891 3bd8fe86 5694891