AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
PyTorch Extension Library of Optimized Scatter Operations
TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
最近更新: 1天前DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
最近更新: 1天前Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
最近更新: 1天前