# OpenPPL

**Repository Path**: mirrors/OpenPPL

## Basic Information

- **Project Name**: OpenPPL
- **Description**: OpenPPL 是基于自研高性能算子库的推理引擎，拥有极致调优的性能；提供云原生环境下 的 AI 模型多后端部署能力，支持 OpenMMLab 等深度学习模型的高效部署
- **Primary Language**: C/C++
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: https://www.oschina.net/p/openppl
- **GVP Project**: No

## Statistics

- **Stars**: 5
- **Forks**: 3
- **Created**: 2021-07-14
- **Last Updated**: 2025-10-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PPLNN

[![website](docs/images/Website-OpenPPL-brightgreen.svg)](https://openppl.ai/)
[![License](docs/images/License-Apache-2.0-green.svg)](LICENSE)
[![qq](docs/images/Chat-on-QQ-red.svg)](https://qm.qq.com/cgi-bin/qm/qr?k=X7JWUqOdBih71dUU9AZF2gD3PKjxaxB-)
[![zhihu](docs/images/Discuss-on-Zhihu.svg)](https://www.zhihu.com/people/openppl)

### Overview

`PPLNN`, which is short for "**P**PLNN is a **P**rimitive **L**ibrary for **N**eural **N**etwork", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for [OpenMMLab](https://github.com/open-mmlab).

![alt arch](docs/images/arch.png)

### **Important Notice**
- PMX has changed to OPMX at 25/04/2024.
- ChatGLM1 will not be supported in OPMX.
- All LLM must be converted(or just rename `pmx_params.json` to `opmx_params.json`) and exported again.
- You can find the old code at [llm_v1](https://github.com/openppl-public/ppl.nn/tree/llm_v1)

### **Known Issues**

 - NCCL issue on some Device: Currently reported that L40S and H800 may encounter illegal memory access on NCCL AllReduce. We suggest trying to turn NCCL protocol `Simple` off by setting environment `NCCL_PROTO=^Simple` to fix this issue.

### LLM Features

- New LLM Engine([Overview](docs/en/llm-cuda-overview.md))
- Flash Attention
- Split-k Attention(Similar with Flash Decoding)
- Group-query Attention
- Dynamic Batching(Also called Continous Batching or In-flight Batching)
- Tensor Parallelism
- Graph Optimization
- INT8 groupwise KV Cache(Numerical accuracy is very close to FP16🚀)
- INT8 per token per channel Quantization(W8A8)

### LLM Model Zoo

- [LLaMA 1/2/3](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/llama)
- [ChatGLM 2/3](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/chatglm2)
- [Baichuan 1/2 7B](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/baichuan)
- [InternLM 1](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/internlm)
- [InternLM 2](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/internlm2)
- [Mixtral](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/mixtral)
- [Qwen 1/1.5](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/qwen)
- [Falcon](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/falcon)
- [Bigcode](https://github.com/openppl-public/ppl.pmx/tree/master/model_zoo/bigcode)

### Hello, world!

* Installing prerequisites:

    - On Debian or Ubuntu:

    ```bash
    apt-get install build-essential cmake git python3 python3-dev
    ```

    - On RedHat or CentOS:

    ```bash
    yum install gcc gcc-c++ cmake3 make git python3 python3-devel
    ```

* Cloning source code:

```bash
git clone https://github.com/openppl-public/ppl.nn.git
```

* Building from source:

```bash
cd ppl.nn
./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON
```

* Running python demo:

```bash
PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-x86 --onnx-model tests/testdata/conv.onnx
```

Refer to [Documents](#documents) for more details.

### Documents

* [Building from Source](docs/en/building-from-source.md)
* [How to Integrate](docs/en/how-to-integrate.md)
* APIs
  - C++
    - [Getting Started](docs/en/cpp-getting-started.md)
    - [API Reference](docs/en/cpp-api-reference.md)
  - Python
    - [Getting Started](docs/en/python-getting-started.md)
    - [API Reference](docs/en/python-api-reference.md)
* Develop Guide
  - [Adding New Engines and Ops](docs/en/add-new-engines-and-ops.md)
  - X86
    - [Supported Ops and Platforms](docs/en/x86-doc/supported-ops-and-platforms.md)
    - [Adding Ops](docs/en/x86-doc/add_op.md)（[中文版](docs/cn/x86-doc/add_op.md)）
    - [Benchmark](docs/en/x86-doc/benchmark_tool.md)（[中文版](docs/cn/x86-doc/benchmark_tool.md)）
  - CUDA
    - [Supported Ops and Platforms](docs/en/cuda-doc/supported-ops-and-platforms.md)
    - [Adding Ops](docs/en/cuda-doc/add_op.md)（[中文版](docs/cn/cuda-doc/add_op.md)）
    - [Benchmark](docs/en/cuda-doc/benchmark_tool.md)（[中文版](docs/cn/cuda-doc/benchmark_tool.md)）
  - RISCV
    - [Supported Ops and Platforms](docs/en/riscv-doc/supported-ops-and-platforms.md)
    - [Adding Ops](docs/en/riscv-doc/add_op.md)（[中文版](docs/cn/riscv-doc/add_op.md)）
    - [Benchmark](docs/en/riscv-doc/benchmark_tool.md)（[中文版](docs/cn/riscv-doc/benchmark_tool.md)）
  - ARM
    - [Adding Ops](docs/en/arm-doc/add_op.md)（[中文版](docs/cn/arm-doc/add_op.md)）
    - [Benchmark](docs/en/arm-doc/benchmark_tool.md)（[中文版](docs/cn/arm-doc/benchmark_tool.md)）
  - LLM-CUDA
    - [Overview](docs/en/llm-cuda-overview.md)
* Models
  - [Converting ONNX Opset](docs/en/onnx-model-opset-convert-guide.md)
  - [Generating ONNX models from OpenMMLab](docs/en/model-convert-guide.md)
* [实现细节](docs/cn/details.md)

### Contact Us

Questions, reports, and suggestions are welcome through GitHub Issues!

| WeChat Official Account | QQ Group |
| :----:| :----: |
| OpenPPL | 627853444 |
| ![OpenPPL](docs/images/qrcode_for_gh_303b3780c847_258.jpg)| ![QQGroup](docs/images/qqgroup_s.jpg) |

### Contributions

This project uses [Contributor Covenant](https://www.contributor-covenant.org/) as code of conduct. Any contributions would be highly appreciated.

### Acknowledgements

* [onnxruntime](https://github.com/microsoft/onnxruntime)
* [onnx](https://github.com/onnx/onnx)
* [openvino](https://github.com/openvinotoolkit/openvino)
* [oneDNN](https://github.com/oneapi-src/oneDNN)
* [TensorRT](https://github.com/NVIDIA/TensorRT)
* [OpenMMLab](https://github.com/open-mmlab)

### License

This project is distributed under the [Apache License, Version 2.0](LICENSE).