# multi_gpu_training **Repository Path**: flyingaicoder/multi_gpu_training ## Basic Information - **Project Name**: multi_gpu_training - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-12-15 - **Last Updated**: 2023-12-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Multi-GPU Training with PyTorch: Data and Model Parallelism ### About The material in this repo demonstrates multi-GPU training using PyTorch. Part 1 covers how to optimize single-GPU training. The necessary code changes to enable multi-GPU training using the data-parallel and model-parallel approaches are then shown. This workshop aims to prepare researchers to use the new H100 GPU nodes as part of Princeton Language and Intelligence. ### Setup Make sure you can run Python on Adroit: ```bash $ ssh @adroit.princeton.edu # VPN required if off-campus $ git clone https://github.com/PrincetonUniversity/multi_gpu_training.git $ cd multi_gpu_training $ module load anaconda3/2023.9 (base) $ python --version Python 3.11.5 ``` ### Getting Help If you encounter any difficulties with the material in this guide then please send an email to cses@princeton.edu or attend a help session. ### Authorship This guide was created by Mengzhou Xia, Alexander Wettig and Jonathan Halverson. Members of Princeton Research Computing made contributions to this material.