From b4edba2c6b0c35667e2487ec54afeddf43fa9ed4 Mon Sep 17 00:00:00 2001
From: gongzequn <gongzequn1@huawei.com>
Date: Wed, 6 Aug 2025 14:03:38 +0800
Subject: [PATCH 1/2] Update an README about dllm

---
 dllm-feature-introduce.md | 84 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 dllm-feature-introduce.md

diff --git a/dllm-feature-introduce.md b/dllm-feature-introduce.md
new file mode 100644
index 0000000..aa2f13c
--- /dev/null
+++ b/dllm-feature-introduce.md
@@ -0,0 +1,84 @@
+# dllm
+
+stand for "distributed llm", aims at providing better tools for distributed vllm serving framework.
+
+## Build guide
+
+> **TL;DR**
+> 
+> ```
+> yum install python3-pip gcc g++ cmake spdlog-devel -y
+> pip install --upgrade pip
+> pip install --upgrade wheel setuptools ninja pybind11 chariot-ds
+> 
+> python3 setup.py bdist_wheel
+> ```
+ 
+### Build requires
+
+**build tools**
+
+* `gcc/g++/make/cmake`: can be installed by `yum install gcc g++ cmake -y`
+* `ninja`: can be installed by `pip install ninja`
+* `python/pip`: can be installed by `yum install python3-pip; pip install --upgrade pip;`
+* `wheel/setuptools`: can be installed by `pip install --upgrade pip wheel setuptools`
+
+> NOTE: upgrade setuptools is necessary in most of OS
+
+**dependencies**
+
+* `spdlog`: can be installed by `yum install spdlog-devel -y`
+* `pybind11`: can be installed by `pip install pybind11`
+* `chariot-ds`: can be installed by `pip install chariot-ds`
+* `ascend cann`: access https://www.hiascend.com/software/cann for installation
+
+### Build command
+
+```bash
+bash build.sh
+# or python3 setup.py bdist_wheel
+```
+
+## Install guide
+
+```bash
+pip install dist/dllm-*.whl
+```
+
+## Use guide
+
+### deploy dependencies
+
+> NOTE: After deploy chariot-ds, you need to set the envrionment `DS_WORKER_ADDR="{IP}:{PORT}"` on each node before start ray.
+
+1. chariot-ds: follow https://pypi.org/project/chariot-ds/
+2. Ray: follow https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html#on-prem
+
+### deploy dllm
+
+use vllm-mindspore as an example, when use,
+
+* 1 Prefill instance, with parallel config: [TP: 4, DP: 4, EP: 16]
+* 1 Decode instance, with parallel config: [TP: 4, DP: 4, EP: 16]
+
+the command should be like:
+
+```bash
+dllm deploy \
+  --prefill-instances-num=1 \
+  --decode-instances-num=1 \
+  -ptp=4 -dtp=4 -pdp=4 -ddp=4 -pep=16 -dep=16 \
+  --prefill-startup-params="vllm-mindspore serve --model=/workspace/models/qwen2.5_7B --trust_remote_code --max-num-seqs=256 --max_model_len=1024 --max-num-batched-tokens=1024 --block-size=128 --gpu-memory-utilization=0.93" \
+  --decode-startup-params="vllm-mindspore serve --model=/workspace/models/qwen2.5_7B --trust_remote_code --max-num-seqs=256 --max_model_len=1024 --max-num-batched-tokens=1024 --block-size=128 --gpu-memory-utilization=0.93"
+```
+
+After deploy success, can access the localhost:8000 as a general openai api endpoint (which is fully compatible)
+
+```bash
+curl -X POST "http://127.0.0.1:8000/v1/completions" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
+  "model": "/workspace/models/qwen2.5_7B",
+  "prompt": "Alice is ",
+  "max_tokens": 50,
+  "temperature": 0
+}'
+```
\ No newline at end of file
-- 
Gitee


From 19670ff8144c35c4d17a9420b5ef8f6e96340f2a Mon Sep 17 00:00:00 2001
From: gongzequn <gongzequn1@huawei.com>
Date: Wed, 6 Aug 2025 14:09:09 +0800
Subject: [PATCH 2/2] Update a readme about dllm

---
 dllm-feature-introduce.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/dllm-feature-introduce.md b/dllm-feature-introduce.md
index aa2f13c..ee70a1b 100644
--- a/dllm-feature-introduce.md
+++ b/dllm-feature-introduce.md
@@ -81,4 +81,12 @@ curl -X POST "http://127.0.0.1:8000/v1/completions" -H "Content-Type: applicatio
   "max_tokens": 50,
   "temperature": 0
 }'
-```
\ No newline at end of file
+```
+
+### enable kv cache protect
+
+To prevent private data leakage, dllm support kv cache protect by encrypt kv cache data when transmitting between prefill and decode instance n PD disaggregated deployment
+
+Kv cache data is encrypt by sec-mask in parallel with inference to enhance encryption performance
+
+To enable kv cache protect, you need to set environment **before start Ray**: `ENABLE_KVC_PROTECT=True`
-- 
Gitee