ONNX Runtime is a performance-focused inference engine for ONNX (Open Neural Network Exchange) models.
Models in the Tensorflow, Keras, PyTorch, scikit-learn, CoreML, and other popular supported formats can be converted to the standard ONNX format, providing framework interoperability and helping to maximize the reach of hardware optimization investments. This provides a solution for systems to integrate a single inference engine to support models trained from a variety of frameworks, while taking advantage of specific hardware accelerators where available.
ONNX Runtime was designed with a focus on performance and scalability in order to support heavy workloads in high-scale production scenarios. It also has extensibility options for compatibility with emerging hardware developments.
ONNX Runtime stays up to date with the ONNX standard and supports all operators from the ONNX v1.2+ spec and is backwards compatible with older versions. Please refer to this page for ONNX opset compatibility details.
High level architectural design
Using various graph optimizations and accelerators, ONNX Runtime can provide lower latency compared to other runtimes for faster end-to-end customer experiences and minimized machine utilization costs. See Performance Tuning guidance.
The list of currently supported accelerators (termed Execution Providers) is below. Please see BUILD.md for build instructions. If you are interested in contributing a new execution provider, please see this page.
Quick Start: The ONNX-Ecosystem Docker container image is available on Dockerhub and includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started. Additional dockerfiles can be found here.
Language | Supported Versions | Samples |
---|---|---|
Python | 3.5, 3.6, 3.7 Python Dev Notes |
Samples |
C# | Samples | |
C++ | Samples | |
C | Samples | |
WinRT | Windows.AI.MachineLearning | Samples |
Java | 8-13 | Samples |
Ruby (external project) | 2.4-2.7 | Samples |
Official builds are published for the default CPU Provider (Eigen + MLAS), as well as GPU with CUDA. Python packages can be found on PyPi, and C#/C/C++ packages on Nuget. Please view the table on aka.ms/onnxruntime for instructions for different build combinations.
For additional build flavors and/or dockerfiles, please see BUILD.md. For production scenarios, it's strongly recommended to build only from an official release branch.
If using pip
to download the Python binaries, run pip install --upgrade pip
prior to downloading.
Contributed non-official packages (including Homebrew, Linuxbrew, and nixpkgs) are listed here. These are not maintained by the core ONNX Runtime team and will have limited support; use at your discretion.
These system requirements must be met for using the compiled binaries.
en_US.UTF-8 locale
is required, as certain operators makes use of system locales.locale-gen en_US.UTF-8
update-locale LANG=en_US.UTF-8
apt-get install libgomp1
.Please see Samples and Tutorials for examples.
To get an ONNX model, please view these ONNX Tutorials. ONNX Runtime supports all versions of ONNX 1.2+. Full versioning compatibility information can be found under Versioning.
ONNX Runtime can be deployed to the cloud for model inferencing using Azure Machine Learning Services. See detailed instructions and sample notebooks.
ONNX Runtime Server (beta) is a hosted application for serving ONNX models using ONNX Runtime, providing a REST API for prediction. Usage details can be found here, and image installation instructions are here.
The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge.
This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inferencing on the target device is crucial for optimal assistance.
To deploy AI workloads to these edge devices and take advantage of hardware acceleration capabilities on the target device, see these reference implementations.
Install or build the package you need to use in your application. Check this page for installation/package guidance. See sample implementations using the C++ API.
On newer Windows 10 devices (1809+), ONNX Runtime is available by default as part of the OS and is accessible via the Windows Machine Learning APIs. Find tutorials here for building a Windows Desktop or UWP application using WinML.
This project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.
We welcome contributions! Please see the contribution guidelines.
For any feedback or to report a bug, please file a GitHub Issue.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。