# rknn-llm **Repository Path**: l6yang/rknn-llm ## Basic Information - **Project Name**: rknn-llm - **Description**: RKLLM SDK可以帮助用户快速将大语言模型部署到ROC-RK3588-RT上 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-09 - **Last Updated**: 2026-02-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Description RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:
In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.
- RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
- RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
- RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.
# Support Platform
- RK3588 Series
- RK3576 Series
- RK3562 Series
- RV1126B Series
# Support Models
- [x] [LLAMA models](https://huggingface.co/meta-llama)
- [x] [TinyLLAMA models](https://huggingface.co/TinyLlama)
- [x] [Qwen2/Qwen2.5/Qwen3](https://huggingface.co/Qwen)
- [x] [Phi2/Phi3](https://huggingface.co/microsoft)
- [x] [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/103caa40027ebfd8450289ca2f278eac4ff26405)
- [x] [Gemma2/Gemma3/Gemma3n](https://huggingface.co/google)
- [x] [InternLM2 models](https://huggingface.co/collections/internlm/internlm2-65b0ce04970888799707893c)
- [x] [MiniCPM3/MiniCPM4](https://huggingface.co/openbmb)
- [x] [TeleChat2](https://huggingface.co/Tele-AI)
- [x] [Qwen2-VL/Qwen3-VL](https://huggingface.co/Qwen)
- [x] [MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
- [x] [DeepSeek-R1-Distill](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d)
- [x] [Janus-Pro-1B](https://huggingface.co/deepseek-ai/Janus-Pro-1B)
- [x] [InternVL2-1B/InternVL3-1B](https://huggingface.co/OpenGVLab)
- [x] [SmolVLM](https://huggingface.co/HuggingFaceTB)
- [x] [RWKV7](https://huggingface.co/fla-hub)
- [x] [DeepSeekOCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
# Quickstart
The easiest way to try it yourself is to download our multimodal vision model example, this demo runs entirely on your local device using **RKNN** (for vision) and **RKLLM** (for language). you can use your own images and ask questions about them. with **RKLLM**, all processing happens locally on your device-your data never leaves it.
1. Download the pre-converted models and the demo executable (located in the `quickstart` directory) from the following [rkllm_model_zoo](https://console.box.lenovo.com/l/l0tXb8), use the fetch code: `rkllm`.
2. Open a terminal and push the demo and model files to your local device:
```bash
adb push ./demo_Linux_aarch64 /data
adb push model.rkllm /data/demo_Linux_aarch64
adb push model.rknn /data/demo_Linux_aarch64
```
3. Enter the demo directory and set up environment variables:
```
adb shell
cd /data/demo_Linux_aarch64
export LD_LIBRARY_PATH=./lib
```
4. Run the demo
```
Usage: ./demo image_path encoder_model_path llm_model_path max_new_tokens max_context_len rknn_core_num [img_start] [img_end] [img_content]
# for Qwen2.5-VL
./demo demo.jpg ./qwen2_5_vl_3b_vision_rk3588.rknn ./qwen2.5-vl-3b-w8a8_level1_rk3588.rkllm 2048 4096 3 "<|vision_start|>" "<|vision_end|>" "<|image_pad|>"
# for Qwen3-VL
./demo demo.jpg ./qwen3-vl-2b_vision_rk3588.rknn ./qwen3-vl-2b-instruct_w8a8_rk3588.rkllm 2048 4096 3 "<|vision_start|>" "<|vision_end|>" "<|image_pad|>"
# for InternVL3
./demo demo.jpg ./internvl3-1b_vision_fp16_rk3588.rknn ./internvl3-1b_w8a8_rk3588.rkllm 2048 4096 3 "