# PyTorch-On-Angel **Repository Path**: mirrors_Angel-ML/PyTorch-On-Angel ## Basic Information - **Project Name**: PyTorch-On-Angel - **Description**: PyTorch On Angel, arming PyTorch with a powerful Parameter Server, which enable PyTorch to train very big models. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-09-24 - **Last Updated**: 2026-03-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Pytorch on Angel A light-weight project which runs pytorch on [angel](https://github.com/Angel-ML/angel), providing pytorch the ability to run with high-dimensional models. ### Architecture ---- ![][1] Pytorch on Angel's architecture design consists of three modules: - **python client**: python client is used to generate the pytorch script module. - **angel ps**: provides a common Parameter Server (PS) service, responsible for distributed model storage, communication synchronization and coordination of computing. - **spark executor**: the worker process is responsible for data processing、load pytorch script module and communicate with the `Angel PS Server`to complete model training and prediction, especially pytorch c++ backend runs in native mode for actual computing backend. To use Pytorch on Angel, we need three components: - a jar file generated by the java subproject; - a .so file compile by the cpp subproject with set of shared libraries for pytorch c++ backend; - the pytorch algorithm script module generated by the python subproject. ### Compilation & Deployment Instructions by Docker #### Compile jar file and the shared c++ libraries package ```bash # Below script will build the jar files and bunlde the shared c++ libraries in containers # The generated files 'pytorch-on-angel-<version>.jar' and 'torch.zip' are in ./dist ./build.sh ``` #### Generate a pytorch script model ```bash # We have implemented some algorithms in the python under the root directory # Below script will generate a deepfm model deepfm.pt in ./dist ./gen_pt_model.sh python/recommendation/deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1 ``` ### Compilation & Deployment Instructions Manually If you don't have a docker environment, you can compile it manually, but you need to install all the dependencies on the machine. **We strongly recommend using docker to compile**. #### Install Pytorch we support pytorch version from 1.2.0 to 1.5.0, it is recommended to use version 1.5.0 - pytorch =v1.5.0 - python =3.7 we recommend using [anaconda](https://www.anaconda.com/) to install pytorch, run command: ```$xslt conda install -c pytorch pytorch==1.5.0 torchvision==0.6.0 cpuonly ``` pytorch detailed installation documentation can refer to [pytorch installation](https://github.com/pytorch/pytorch#installation) #### Compiling java submodule 1. **Compiling Environment Dependencies** - Jdk >= 1.8 - Maven >= 3.0.5 2. **Source Code Download** ```$xslt git clone https://github.com/Angel-ML/PyTorch-On-Angel.git ``` 3. **Compile** Run the following command in the java root directory of the source code: ```$xslt mvn clean package -Dmaven.test.skip=true ``` After compiling, a jar package named '**pytorch-on-angel-<version>.jar**' will be generated in `target` under the java root directory. #### Compiling cpp submodule 1. **Compiling Environment Dependencies** - gcc >= 5 - cmake >= 3.12 2. **LibTorch Download** - Download the `libtorch` package from [here](https://pytorch.org/) and extract it to the user-specified directory - set TORCH_HOME(path to libtorch) in `CMakeLists.txt` under the cpp root directory 3. **Compile** Run the following command in the `cmake-build-debug` directory under the cpp root directory: ```$xslt cmake .. make ``` After compiling, a shared library named '**libtorch_angel.so**' will be generated in `cmake-build-debug` under the cpp root directory. ### Quick Start #### Spark on Angel deployment PyTorch on angel runs on Angel, so you must deploy the Angel client first. The specific deployment process can refer to [documentation](https://github.com/Angel-ML/angel/blob/branch-3.2.0/docs/tutorials/spark_on_angel_quick_start.md). note: **It is recommended to run PyTorch on Angel on Angel 3.2.0** #### Submit to Cluster Use `$SPARK_HOME/bin/spark-submit` to submit the application to cluster in the pytorch on angel client. Here are the submit example for deepfm. 1. **Generate pytorch script model** follow Compilation & Deployment Instructions by Docker to generate pytorch model file or you can generate by manually, for example: ```$xslt python deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1 ``` 2. **Package c++ library files** You should put the compiled libtoch_angel.so into the lib package of libtroch, and then package it, follow Compilation & Deployment Instructions by Docker or Manually to get c++ library package, for example named `torch.zip` 3. **Upload training data to hdfs** upload training data python/recommendation/census_148d_train.libsvm.tmp to hdfs directory 4. **Submit to Cluster** , ```$xslt source ./spark-on-angel-env.sh $SPARK_HOME/bin/spark-submit \ --master yarn-cluster\ --conf spark.ps.instances=2 \ --conf spark.ps.cores=1 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.memory=3g \ --conf spark.ps.log.level=INFO \ --conf spark.driver.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/torch-lib \ --conf spark.executor.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/torch-lib \ --conf spark.executor.extraLibraryPath=./torch/torch-lib \ --conf spark.driver.extraLibraryPath=./torch/torch-lib \ --conf spark.executorEnv.OMP_NUM_THREADS=2 \ --conf spark.executorEnv.MKL_NUM_THREADS=2 \ --queue $queue \ --name "deepfm for torch on angel" \ --jars $SONA_SPARK_JARS \ --archives torch.zip#torch\ --files deepfm.pt \ --driver-memory 1g \ --num-executors 2 \ --executor-cores 1 \ --executor-memory 3g \ --class com.tencent.angel.pytorch.examples.supervised.RecommendationExample \ ./pytorch-on-angel-0.3.0.jar \ trainInput:$input batchSize:128 torchModelPath:deepfm.pt \ stepSize:0.001 numEpoch:10 testRatio:0.1 \ angelModelOutputPath:$output \ ``` ### Algorithms Currently, PyTorch on Angel supports a series of recommendation and deep graph convolution network algorithms. 1. [Recommendation Algorithms](./docs/recommendation.md) 2. [Graph Algorithms](./docs/graph.md) [1]: ./docs/img/pytorch_on_angel_framework.png