3 Star 1 Fork 0

Gitee 极速下载/GraphLearn-for-PyTorch

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/alibaba/graphlearn-for-pytorch
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Benchmarks

1. Test on single node

  1. neighbor sampling
python bench_sampler.py --backend=glt --sample_mode=GPU
  1. feature lookup
python bench_feature.py --backend=glt --split_ratio=0.2
  1. sampling and feature lookup please refer to examples/multi_gpu

Here, we present the scalability results of a single machine with A100 GPUs. we use CUDA 11.4, PyTorch 1.12, and GLT 0.2.0rc2, and show the overall throughput of neighbor sampling and feature lookup.

2. Test on distributed settings.

Distributed neighbor loader (2 nodes each with 2 GPUs)

# node 0:
CUDA_VISIBLE_DEVICES=0,1 python bench_dist_neighbor_loader.py --node_rank=0

# node 1:
CUDA_VISIBLE_DEVICES=2,3 python bench_dist_neighbor_loader.py --node_rank=1

You can also use our script for the benchmark:

Step 0: Setup a Distributed File System

Note: You may skip this step if you already set up folder(s) synchronized across machines.

To perform distributed sampling, files and codes need to be accessed across multiple machines. A distributed file system (i.e., NFS, Ceph, SSHFS, etc) exempt you from synchnonizing files such as partition information by hand.

For configuration details, please refer to the following links:

NFS: https://wiki.archlinux.org/index.php/NFS

SSHFS: https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

Ceph: https://docs.ceph.com/en/latest/install/

Step 1: Prepare data

Here we use ogbn_products and partition it into 2 partitions.

python ../../examples/distributed/partition_ogbn_dataset.py --dataset=ogbn-products --num_partitions=2

Step 2: Set up the configure file

An example template for configure file is in the yaml format. e.g. bench_dist_config.yml.

# IP addresses for all nodes.
nodes:
  - 0.0.0.0
  - 1.1.1.1

# ssh IP ports for each node.
ports: [22, 22]

# path to python with GLT envs
python_bins:
  - /path/to/python
  - /path/to/python
# username for remote IPs.
usernames:
  - root
  - root

# dataset name, e.g. products, papers.
dataset: products

node_ranks: [0, 1]

dst_paths:
  - /path/to/GLT
  - /path/to/GLT

# Setup visible cuda devices for each node.
visible_devices:
  - 0,1,2,3
  - 4,5,6,7

Step 3: Launch distributed jobs

pip install paramiko
pip install click
python run_dist_bench.py --config=bench_dist_config.yml --master_addr=0.0.0.0 --master_port=11234

Optional parameters which you can append after the command above includes:

--epochs: repeat epochs for sampling, (default=1)
--batch_size: batch size for sampling, (default=2048)
--shuffle: whether to shuffle input seeds at each epoch, (default=False)
--with_edge: whether to sample with edge ids, (default=False)
--collect_features: whether to collect features for sampled results, (default=False)
--worker_concurrency: concurrency for each sampling worker, (default=4)
--channel_size: memory used for shared-memory channel, (default='4GB')

Here, we present the scaling out results of distributed sampling and feature lookup. We demonstrate the acceleration ratios of throughput for 2 nodes and 4 nodes, compared to a single node with 2 A100 GPUs each node. The tests were carried o ut using CUDA version 11.4, PyTorch version 1.12, and GLT version 0.2.0rc2.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mirrors/GraphLearn-for-PyTorch.git
git@gitee.com:mirrors/GraphLearn-for-PyTorch.git
mirrors
GraphLearn-for-PyTorch
GraphLearn-for-PyTorch
main

搜索帮助