From b4edba2c6b0c35667e2487ec54afeddf43fa9ed4 Mon Sep 17 00:00:00 2001 From: gongzequn Date: Wed, 6 Aug 2025 14:03:38 +0800 Subject: [PATCH 1/2] Update an README about dllm --- dllm-feature-introduce.md | 84 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 dllm-feature-introduce.md diff --git a/dllm-feature-introduce.md b/dllm-feature-introduce.md new file mode 100644 index 0000000..aa2f13c --- /dev/null +++ b/dllm-feature-introduce.md @@ -0,0 +1,84 @@ +# dllm + +stand for "distributed llm", aims at providing better tools for distributed vllm serving framework. + +## Build guide + +> **TL;DR** +> +> ``` +> yum install python3-pip gcc g++ cmake spdlog-devel -y +> pip install --upgrade pip +> pip install --upgrade wheel setuptools ninja pybind11 chariot-ds +> +> python3 setup.py bdist_wheel +> ``` + +### Build requires + +**build tools** + +* `gcc/g++/make/cmake`: can be installed by `yum install gcc g++ cmake -y` +* `ninja`: can be installed by `pip install ninja` +* `python/pip`: can be installed by `yum install python3-pip; pip install --upgrade pip;` +* `wheel/setuptools`: can be installed by `pip install --upgrade pip wheel setuptools` + +> NOTE: upgrade setuptools is necessary in most of OS + +**dependencies** + +* `spdlog`: can be installed by `yum install spdlog-devel -y` +* `pybind11`: can be installed by `pip install pybind11` +* `chariot-ds`: can be installed by `pip install chariot-ds` +* `ascend cann`: access https://www.hiascend.com/software/cann for installation + +### Build command + +```bash +bash build.sh +# or python3 setup.py bdist_wheel +``` + +## Install guide + +```bash +pip install dist/dllm-*.whl +``` + +## Use guide + +### deploy dependencies + +> NOTE: After deploy chariot-ds, you need to set the envrionment `DS_WORKER_ADDR="{IP}:{PORT}"` on each node before start ray. + +1. chariot-ds: follow https://pypi.org/project/chariot-ds/ +2. Ray: follow https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html#on-prem + +### deploy dllm + +use vllm-mindspore as an example, when use, + +* 1 Prefill instance, with parallel config: [TP: 4, DP: 4, EP: 16] +* 1 Decode instance, with parallel config: [TP: 4, DP: 4, EP: 16] + +the command should be like: + +```bash +dllm deploy \ + --prefill-instances-num=1 \ + --decode-instances-num=1 \ + -ptp=4 -dtp=4 -pdp=4 -ddp=4 -pep=16 -dep=16 \ + --prefill-startup-params="vllm-mindspore serve --model=/workspace/models/qwen2.5_7B --trust_remote_code --max-num-seqs=256 --max_model_len=1024 --max-num-batched-tokens=1024 --block-size=128 --gpu-memory-utilization=0.93" \ + --decode-startup-params="vllm-mindspore serve --model=/workspace/models/qwen2.5_7B --trust_remote_code --max-num-seqs=256 --max_model_len=1024 --max-num-batched-tokens=1024 --block-size=128 --gpu-memory-utilization=0.93" +``` + +After deploy success, can access the localhost:8000 as a general openai api endpoint (which is fully compatible) + +```bash +curl -X POST "http://127.0.0.1:8000/v1/completions" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{ + "model": "/workspace/models/qwen2.5_7B", + "prompt": "Alice is ", + "max_tokens": 50, + "temperature": 0 +}' +``` \ No newline at end of file -- Gitee From 19670ff8144c35c4d17a9420b5ef8f6e96340f2a Mon Sep 17 00:00:00 2001 From: gongzequn Date: Wed, 6 Aug 2025 14:09:09 +0800 Subject: [PATCH 2/2] Update a readme about dllm --- dllm-feature-introduce.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/dllm-feature-introduce.md b/dllm-feature-introduce.md index aa2f13c..ee70a1b 100644 --- a/dllm-feature-introduce.md +++ b/dllm-feature-introduce.md @@ -81,4 +81,12 @@ curl -X POST "http://127.0.0.1:8000/v1/completions" -H "Content-Type: applicatio "max_tokens": 50, "temperature": 0 }' -``` \ No newline at end of file +``` + +### enable kv cache protect + +To prevent private data leakage, dllm support kv cache protect by encrypt kv cache data when transmitting between prefill and decode instance n PD disaggregated deployment + +Kv cache data is encrypt by sec-mask in parallel with inference to enhance encryption performance + +To enable kv cache protect, you need to set environment **before start Ray**: `ENABLE_KVC_PROTECT=True` -- Gitee