1 Star 0 Fork 0

黎瑞/TensorRT-LLM

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
README.md 5.27 KB
一键复制 编辑 原始数据 按行查看 历史

TensorRT-LLM

A TensorRT Toolbox for Optimized Large Language Model Inference

Documentation python cuda trt version license

Architecture   |   Results   |   Examples   |   Documentation


Latest News

TensorRT-LLM Overview

TensorRT-LLM is an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).

The TensorRT-LLM Python API architecture looks similar to the PyTorch API. It provides a functional module containing functions like einsum, softmax, matmul or view. The layers module bundles useful building blocks to assemble LLMs; like an Attention block, a MLP or the entire Transformer layer. Model-specific components, like GPTAttention or BertAttention, can be found in the models module.

TensorRT-LLM comes with several popular models pre-defined. They can easily be modified and extended to fit custom needs. Refer to the Support Matrix for a list of supported models.

To maximize performance and reduce memory footprint, TensorRT-LLM allows the models to be executed using different quantization modes (refer to support matrix). TensorRT-LLM supports INT4 or INT8 weights (and FP16 activations; a.k.a. INT4/INT8 weight-only) as well as a complete implementation of the SmoothQuant technique.

Getting Started

To get started with TensorRT-LLM, visit our documentation:

Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/lirui2017/TensorRT-LLM.git
git@gitee.com:lirui2017/TensorRT-LLM.git
lirui2017
TensorRT-LLM
TensorRT-LLM
main

搜索帮助