# PyTorch-On-Angel

**Repository Path**: mirrors_Angel-ML/PyTorch-On-Angel

## Basic Information

- **Project Name**: PyTorch-On-Angel
- **Description**: PyTorch On Angel, arming PyTorch with a powerful Parameter Server, which enable PyTorch to train very big models. 
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-24
- **Last Updated**: 2026-03-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## Pytorch on Angel 

A light-weight project which runs pytorch on [angel](https://github.com/Angel-ML/angel), providing pytorch the ability to run with high-dimensional models.

### Architecture

----

![][1]

Pytorch on Angel's architecture design consists of three modules:

  - **python client**: python client is used to generate the pytorch script module.
  - **angel ps**: provides a common Parameter Server (PS) service, responsible for distributed model storage, communication synchronization and coordination of computing.
  - **spark executor**: the worker process is responsible for data processing、load pytorch script module and communicate with the `Angel PS Server`to complete model training and prediction, especially pytorch c++ backend runs in native mode for actual computing backend.

To use Pytorch on Angel, we need three components: 
  - a jar file generated by the java subproject; 
  - a .so file compile by the cpp subproject with set of shared libraries for pytorch c++ backend; 
  - the pytorch algorithm script module generated by the python subproject.

### Compilation & Deployment Instructions by Docker

#### Compile jar file and the shared c++ libraries package

```bash
# Below script will build the jar files and bunlde the shared c++ libraries in containers
# The generated files 'pytorch-on-angel-&lt;version&gt;.jar' and 'torch.zip' are in ./dist
./build.sh
```

#### Generate a pytorch script model

```bash
# We have implemented some algorithms in the python under the root directory
# Below script will generate a deepfm model deepfm.pt in ./dist
./gen_pt_model.sh python/recommendation/deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1
```

### Compilation & Deployment Instructions Manually
If you don't have a docker environment, you can compile it manually, but you need to install all the dependencies on the machine. **We strongly recommend using docker to compile**.

#### Install Pytorch
we support pytorch version from 1.2.0 to 1.5.0, it is recommended to use version 1.5.0

  - pytorch =v1.5.0 
  - python =3.7
 
we recommend using [anaconda](https://www.anaconda.com/) to install pytorch, run command:
```$xslt
conda install -c pytorch pytorch==1.5.0 torchvision==0.6.0 cpuonly
```
pytorch detailed installation documentation can refer to [pytorch installation](https://github.com/pytorch/pytorch#installation)


#### Compiling java submodule
1. **Compiling Environment Dependencies**
   - Jdk >= 1.8
   - Maven >= 3.0.5

2. **Source Code Download**
   ```$xslt
   git clone https://github.com/Angel-ML/PyTorch-On-Angel.git
   ```

3. **Compile**  
   Run the following command in the java root directory of the source code:
   ```$xslt
   mvn clean package -Dmaven.test.skip=true
   ```
   After compiling, a jar package named '**pytorch-on-angel-&lt;version&gt;.jar**' will be generated in `target` under the java root directory.


#### Compiling cpp submodule
1. **Compiling Environment Dependencies**
   - gcc >= 5
   - cmake >= 3.12

2. **LibTorch Download**
   - Download the `libtorch` package from [here](https://pytorch.org/) and extract it to the user-specified directory
   - set TORCH_HOME(path to libtorch) in `CMakeLists.txt` under the  cpp root directory
  
3. **Compile**
   Run the following command in the `cmake-build-debug` directory under the  cpp root directory:
   ```$xslt
   cmake ..
   make
   ```
   After compiling, a shared library named '**libtorch_angel.so**' will be generated in `cmake-build-debug` under the  cpp root directory.
   
### Quick Start

#### Spark on Angel deployment
PyTorch on angel runs on Angel, so you must deploy the Angel client first. The specific deployment process can refer to [documentation](https://github.com/Angel-ML/angel/blob/branch-3.2.0/docs/tutorials/spark_on_angel_quick_start.md).  
note: **It is recommended to run PyTorch on Angel on Angel 3.2.0**

#### Submit to Cluster
Use `$SPARK_HOME/bin/spark-submit` to submit the application to cluster in the pytorch on angel client.   
Here are the submit example for deepfm.
1. **Generate pytorch script model**  
   follow Compilation & Deployment Instructions by Docker to generate pytorch model file or you can generate by manually, for example:
   ```$xslt
   python deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1
   ```
2. **Package c++ library files**
   
   You should put the compiled libtoch_angel.so into the lib package of libtroch, and then package it，
   follow Compilation & Deployment Instructions by Docker or Manually to get c++ library package, for example named `torch.zip`

3. **Upload training data to hdfs**
   upload training data python/recommendation/census_148d_train.libsvm.tmp to hdfs directory

4. **Submit to Cluster** ,
   ```$xslt
   source ./spark-on-angel-env.sh  
   $SPARK_HOME/bin/spark-submit \
          --master yarn-cluster\
          --conf spark.ps.instances=2 \
          --conf spark.ps.cores=1 \
          --conf spark.ps.jars=$SONA_ANGEL_JARS \
          --conf spark.ps.memory=3g \
          --conf spark.ps.log.level=INFO \
          --conf spark.driver.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/torch-lib \
          --conf spark.executor.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/torch-lib \
          --conf spark.executor.extraLibraryPath=./torch/torch-lib \
          --conf spark.driver.extraLibraryPath=./torch/torch-lib \
          --conf spark.executorEnv.OMP_NUM_THREADS=2 \
          --conf spark.executorEnv.MKL_NUM_THREADS=2 \
          --queue $queue \
          --name "deepfm for torch on angel" \
          --jars $SONA_SPARK_JARS  \
          --archives torch.zip#torch\
          --files deepfm.pt \  
          --driver-memory 1g \
          --num-executors 2 \
          --executor-cores 1 \
          --executor-memory 3g \
          --class com.tencent.angel.pytorch.examples.supervised.RecommendationExample \
          ./pytorch-on-angel-0.3.0.jar \  
          trainInput:$input batchSize:128 torchModelPath:deepfm.pt \
          stepSize:0.001 numEpoch:10 testRatio:0.1 \
          angelModelOutputPath:$output \
   ```

### Algorithms
Currently, PyTorch on Angel supports a series of recommendation and deep graph convolution network algorithms.

1. [Recommendation Algorithms](./docs/recommendation.md)
2. [Graph Algorithms](./docs/graph.md)


[1]: ./docs/img/pytorch_on_angel_framework.png