# ddl-platform **Repository Path**: blackjack2015/ddl-platform ## Basic Information - **Project Name**: ddl-platform - **Description**: 分布式训练任务资源调度平台 - **Primary Language**: Python - **License**: GPL-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 3 - **Created**: 2023-08-30 - **Last Updated**: 2023-11-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ddl-platform: A Distributed Deep Learning Platform for Deep Learning Training Jobs ## Introduction This is a training platform for (distributed) deep learning. The current version only supports PyTorch. ## Installation ### Prerequisites - Python3 - PyTorch-1.4+ - [OpenMPI-4.+](https://www.open-mpi.org/software/ompi/v4.0/) ### Install common packages ``` $pip install -r requirements.txt ``` ### Install a customized library ``` $cd ddl-platform/ddl_platform/comm_core $export NCCL_DIR=/path/to/nccl $export MPI_DIR=/path/to/openmpi $./compile.sh ``` ## Quick Start ### Start a redis server ``` $redis-server ``` ### Start our Job Manager ``` $cd ddl-platform/ddl_platform/scheduler $python job_manager.py ``` ### Start our Resource Manager ``` $cd ddl-platform/ddl_platform/scheduler $python resource_manager.py ``` ### Start our Scheduler ``` $cd ddl-platform/ddl_platform/scheduler $python scheduler.py ``` ### Insert a job for testing ``` $cd ddl-platform $./scripts/generate_jobs.sh ```