# tritonservice_ascend_yolov8s **Repository Path**: sundaelite/tritonservice_ascend_yolov8s ## Basic Information - **Project Name**: tritonservice_ascend_yolov8s - **Description**: tritonservice 运行昇腾模型 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-29 - **Last Updated**: 2025-10-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## 在昇腾环境上部署Triton Inference Server ### 1、背景随着人工智能技术的快速发展，模型推理服务的部署效率与性能优化成为实际应用中的关键挑战。Triton Inference Server作为NVIDIA开源的推理服务框架，以其多框架支持、动态批处理和高并发能力受到广泛关注。昇腾800IA2作为国产AI加速卡，在推理场景下展现出优异的能效比和计算性能。本文将详细介绍在昇腾800IA2硬件环境上部署Triton Inference Server的完整流程，并以目标检测小模型的服务化推理为实践案例，验证部署方案的可行性和性能表现。通过本实践，读者将了解： - 昇腾CANN软件栈与Triton Server的兼容性配置 - 针对昇腾硬件的模型转换 - 基于Triton的推理服务部署本博客不仅适用于目标检测模型，其方法论亦可扩展至其他计算机视觉乃至自然语言处理任务的模型部署，为在国产AI硬件上构建高效推理服务提供参考。 ### 2、环境介绍硬件： - Atlas800IA2 - 910B （32GB显存） - CPU （KUNPENG) - 内存（16GB）软件： - NPU驱动： 25.2.0 - CANN版本：8.2.RC1 - Triton: 2.42.0 ### 3、资源获取 - 集成docker环境获取地址： - 相关模型和源码获取地址：https://gitee.com/sundaelite/tritonservice_ascend_yolov8s ### 4、创建triton环境 1、使用以下命令创建一个全新的tritonserver+ascend的环境； ```shell docker run -itd -u root \ --net=host \ --ipc=host \ --privileged \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ -v /var/log/npu/:/usr/slog \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/sbin:/usr/local/sbin \ -v /home/data:/data \ --name tritonserver \ --entrypoint=/bin/bash \ 28961f2d1d8b ``` 如图所示：![输入图片说明](image/image.png) 2、使用docker exec -it -u 0 tritonserver bash进入环境； 3、进入源码的models路径，运行atc.sh，将onnx模型转换成om模型 ![输入图片说明](image/image3.pngimage.png) atc内容： ```shell atc --model=yolov8s.onnx --framework=5 --output=yolov8s --input_format=NCHW --op_select_implmode=high_performance --input_shape='images:1,3,640,640' --log=info --soc_version=Ascend910B4 ``` 运行完成提示：ATC run success, welcome to the next use. 4、运行python infer.py使用脚本做模型纯推理运行后在本目录下生成一个output_image.jpg的推理结果图运行结果：正在初始化推理会话... [INFO] acl init success [INFO] open device 0 success [INFO] create new context [INFO] load model yolov8s.om success [INFO] create model description success 推理会话初始化成功。正在进行图像预处理... 原始图像尺寸: 1024x683 预处理信息 - 缩放因子: 0.625, 填充: (0, 107) 开始推理... 推理完成。正在进行后处理... (8400, 84) 最终检测到 1 个目标。检测到 dog: 0.945 坐标范围: x1:113, y1:130, x2:917, y2:589 结果图像已保存至: output_image.jpg [INFO] unload model success, model Id is 1 资源已释放。 [INFO] end to reset device 0 [INFO] end to finalize acl ![输入图片说明](image/infer.pngimage.png) 至此说明环境没有问题； ### 5、triton服务化推理 1、进入源代码主目录下，有一个run_triton_service.sh的脚本，内容如下： tritonserver --model-repository=./model_repository --log-info=true --log-verbose=1 它主要是使用tritonserver 运行模型仓；我们进入查看./model_repository的目录结构： ![输入图片说明](image/treeimage.png) ```shell model_repository/ # Triton模型仓库根目录 └── yolov8s # 模型目录（模型名称：yolov8s） ├── 1 # 版本目录（版本号：1） │ ├── model.py # Python模型推理脚本 │ ├── __pycache__/ # Python缓存目录 │ │ └── model.cpython-310.pyc # 编译后的Python字节码 │ └── yolov8s.om # 昇腾离线模型文件（通过ATC转换） └── config.pbtxt # 模型配置文件 ``` - 关键文件说明 - 1. config.pbtxt- 模型配置 ```yaml name: "yolov8s" # 模型名称 backend: "python" # 使用Python后端 max_batch_size: 1 # 最大批处理大小 input: [ { name: "images", # 输入名称 data_type: TYPE_FP32, # 数据类型 dims: [3, 640, 640] # 输入维度 [通道, 高, 宽] } ] output: [ { name: "output0", # 输出名称 data_type: TYPE_FP32, dims: [84, 8400] # 输出维度 } ] ``` - 2. model.py- Python推理脚本 ```python import cv2 import numpy as np from ais_bench.infer.interface import InferSession from torch.utils.dlpack import to_dlpack, from_dlpack import triton_python_backend_utils as pb_utils class TritonPythonModel: def initialize(self, args): self.model_path = "./model_repository/yolov8s/1/yolov8s.om" self.device_id = 0 self.session = InferSession(self.device_id, self.model_path) def execute(self, requests): responses = [] for res in requests: image = from_dlpack(pb_utils.get_input_tensor_by_name(res, 'images').to_dlpack()) image = np.array(image) outputs = self.session.infer([image], mode="static") output_tensor = pb_utils.Tensor('output0', outputs[0]) response = pb_utils.InferenceResponse(output_tensors=[output_tensor]) responses.append(response) return responses def finalize(self): print("Inference finalized") ``` - 3. yolov8s.om- 昇腾离线模型通过ATC工具从ONNX/PyTorch转换而来针对昇腾硬件优化，包含图编译和算子优化 2、启动服务化程序：bash run_triton_service.sh 如图所示，说明运行成功： ![输入图片说明](image/runimage.png) I1029 16:37:47.487552 4741 server.cc:676] +---------+---------+--------+ | Model | Version | Status | +---------+---------+--------+ | yolov8s | 1 | READY | +---------+---------+--------+ ### 6、客户端代码运行进入client目录下，该目录是triton的客户端代码目录； clienet.py的代码样例： ```python #!/usr/bin/env python import argparse import numpy as np import sys import cv2 import tritonclient.grpc as grpcclient from tritonclient.utils import InferenceServerException CLASS_NAMES = [ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" ] CONF_THRESHOLD = 0.25 IOU_THRESHOLD = 0.45 INPUT_NAMES = ["images"] OUTPUT_NAMES = ["output0"] def preprocess_image(image_path, input_shape=(640, 640)): """ """ original_image = cv2.imread(image_path) if original_image is None: raise ValueError(f"无法读取图像: {image_path}") original_h, original_w = original_image.shape[:2] input_h, input_w = input_shape scale = min(input_h / original_h, input_w / original_w) new_h, new_w = int(original_h * scale), int(original_w * scale) resized_image = cv2.resize(original_image, (new_w, new_h)) padded_image = np.full((input_h, input_w, 3), 114, dtype=np.uint8) dx = (input_w - new_w) // 2 dy = (input_h - new_h) // 2 padded_image[dy:dy+new_h, dx:dx+new_w] = resized_image normalized_image = padded_image.astype(np.float32) / 255.0 blob = np.transpose(normalized_image, (2, 0, 1)) blob = np.expand_dims(blob, axis=0) return blob, original_image, (scale, dx, dy) def postprocess_yolov8(outputs, original_shape, preprocess_info, input_shape=(640, 640), conf_threshold=0.25, iou_threshold=0.45): """ 修正后的后处理函数：正确的坐标映射 """ original_h, original_w = original_shape[:2] input_h, input_w = input_shape scale, pad_x, pad_y = preprocess_info prediction = outputs.squeeze(0).transpose(1, 0) # (8400, 84) boxes = [] scores = [] class_ids = [] for i in range(prediction.shape[0]): # 模型输出的是基于640×640输入尺寸的绝对坐标（包含填充） cx, cy, w, h = prediction[i, :4] classes_scores = prediction[i, 4:] max_score = np.max(classes_scores) class_id = np.argmax(classes_scores) if max_score >= conf_threshold: # 将中心点坐标转换为边界框坐标 x1 = cx - w / 2 y1 = cy - h / 2 x2 = cx + w / 2 y2 = cy + h / 2 # 关键修正：正确的坐标映射步骤 # 1. 去除填充的影响 x1 = max(0, x1 - pad_x) y1 = max(0, y1 - pad_y) x2 = max(0, x2 - pad_x) y2 = max(0, y2 - pad_y) # 2. 缩放到原始图像尺寸（除以缩放因子） x1 = int(x1 / scale) y1 = int(y1 / scale) x2 = int(x2 / scale) y2 = int(y2 / scale) # 3. 确保坐标在有效范围内 x1 = max(0, min(x1, original_w)) y1 = max(0, min(y1, original_h)) x2 = max(0, min(x2, original_w)) y2 = max(0, min(y2, original_h)) # 只保留有效的检测框 if x2 > x1 and y2 > y1 and (x2 - x1) > 5 and (y2 - y1) > 5: boxes.append([x1, y1, x2, y2]) scores.append(max_score) class_ids.append(class_id) # 应用NMS if len(boxes) > 0: boxes = np.array(boxes) scores = np.array(scores) class_ids = np.array(class_ids) indices = cv2.dnn.NMSBoxes(boxes.tolist(), scores.tolist(), conf_threshold, iou_threshold) if len(indices) > 0: indices = indices.flatten() return boxes[indices], scores[indices], class_ids[indices] return np.array([]), np.array([]), np.array([]) def draw_detections(image, boxes, scores, class_ids): """ 绘制检测结果 """ h, w = image.shape[:2] for i in range(len(boxes)): box = boxes[i] score = scores[i] class_id = class_ids[i] x1 = int(box[0]) y1 = int(box[1]) x2 = int(box[2]) y2 = int(box[3]) # 确保坐标在有效范围内 x1 = max(0, min(x1, w-1)) y1 = max(0, min(y1, h-1)) x2 = max(0, min(x2, w-1)) y2 = max(0, min(y2, h-1)) color = (0, 255, 0) # 绿色框 cv2.rectangle(image, (x1, y1), (x2, y2), color, 2) label = f"{CLASS_NAMES[class_id]}: {score:.2f}" label_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)[0] cv2.rectangle(image, (x1, y1 - label_size[1] - 10), (x1 + label_size[0], y1), color, -1) cv2.putText(image, label, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1) print(f"检测到 {CLASS_NAMES[class_id]}: {score:.3f}") print(f"坐标范围: x1:{x1}, y1:{y1}, x2:{x2}, y2:{y2}") return image if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--input', type=str, help='Input file to load from in image or video mode') parser.add_argument('-m', '--model', type=str, required=False, default='yolov8s', help='Inference model name, default yolov7') parser.add_argument('--width', type=int, required=False, default=640, help='Inference model input width, default 640') parser.add_argument('--height', type=int, required=False, default=640, help='Inference model input height, default 640') parser.add_argument('-u', '--url', type=str, required=False, default='localhost:8001', help='Inference server URL, default localhost:8001') parser.add_argument('-o', '--out', type=str, required=False, default='', help='Write output into file instead of displaying it') parser.add_argument('-f', '--fps', type=float, required=False, default=24.0, help='Video output fps, default 24.0 FPS') parser.add_argument('-i', '--model-info', action="store_true", required=False, default=False, help='Print model status, configuration and statistics') parser.add_argument('-v', '--verbose', action="store_true", required=False, default=False, help='Enable verbose client output') parser.add_argument('-t', '--client-timeout', type=float, required=False, default=None, help='Client timeout in seconds, default no timeout') parser.add_argument('-s', '--ssl', action="store_true", required=False, default=False, help='Enable SSL encrypted channel to the server') parser.add_argument('-r', '--root-certificates', type=str, required=False, default=None, help='File holding PEM-encoded root certificates, default none') parser.add_argument('-p', '--private-key', type=str, required=False, default=None, help='File holding PEM-encoded private key, default is none') parser.add_argument('-x', '--certificate-chain', type=str, required=False, default=None, help='File holding PEM-encoded certicate chain default is none') FLAGS = parser.parse_args() try: triton_client = grpcclient.InferenceServerClient( url=FLAGS.url, verbose=FLAGS.verbose, ssl=FLAGS.ssl, root_certificates=FLAGS.root_certificates, private_key=FLAGS.private_key, certificate_chain=FLAGS.certificate_chain) except Exception as e: print("context creation failed: " + str(e)) sys.exit() # Health check if not triton_client.is_server_live(): print("FAILED : is_server_live") sys.exit(1) if not triton_client.is_server_ready(): print("FAILED : is_server_ready") sys.exit(1) if not triton_client.is_model_ready(FLAGS.model): print("FAILED : is_model_ready") sys.exit(1) if FLAGS.model_info: # Model metadata try: metadata = triton_client.get_model_metadata(FLAGS.model) print(metadata) except InferenceServerException as ex: if "Request for unknown model" not in ex.message(): print("FAILED : get_model_metadata") print("Got: {}".format(ex.message())) sys.exit(1) else: print("FAILED : get_model_metadata") sys.exit(1) # Model configuration try: config = triton_client.get_model_config(FLAGS.model) if not (config.config.name == FLAGS.model): print("FAILED: get_model_config") sys.exit(1) print(config) except InferenceServerException as ex: print("FAILED : get_model_config") print("Got: {}".format(ex.message())) sys.exit(1) if True: if not FLAGS.input: print("FAILED: no input image") sys.exit(1) inputs = [] outputs = [] inputs.append(grpcclient.InferInput(INPUT_NAMES[0], [1, 3, FLAGS.width, FLAGS.height], "FP32")) outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[0])) print("Creating buffer from image file...") # 图像预处理 print("正在进行图像预处理...") input_blob, original_image, preprocess_info = preprocess_image(str(FLAGS.input)) original_h, original_w = original_image.shape[:2] print(f"原始图像尺寸: {original_w}x{original_h}") print(f"预处理信息 - 缩放因子: {preprocess_info[0]:.3f}, 填充: ({preprocess_info[1]}, {preprocess_info[2]}) input shape:{input_blob.shape}") # input_image = cv2.imread(str(FLAGS.input)) # input_image = cv2.resize(input_image, (FLAGS.width, FLAGS.height)) # input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB) # print("input shape:", input_image.shape) # input_image = np.transpose(input_image, (2, 0, 1)) # 从(H,W,C)变为(C,H,W) # input_image = np.expand_dims(input_image, axis=0) # 增加批次维度，变为(1,C,H,W) # input_image = np.array(input_image, dtype=np.float32) # print("get input shape:", input_image.shape) inputs[0].set_data_from_numpy(input_blob) print("Invoking inference...") results = triton_client.infer(model_name=FLAGS.model, inputs=inputs, outputs=outputs, client_timeout=FLAGS.client_timeout) if FLAGS.model_info: statistics = triton_client.get_inference_statistics(model_name=FLAGS.model) if len(statistics.model_stats) != 1: print("FAILED: get_inference_statistics") sys.exit(1) print(statistics) print("Done") for output in OUTPUT_NAMES: result = results.as_numpy(output) print(f"Received result buffer \"{output}\" of size {result.shape}") print(f"Naive buffer sum: {np.sum(result)}") det_outputs = results.as_numpy(OUTPUT_NAMES[0]) # 4. 后处理 print("正在进行后处理...") boxes, scores, class_ids = postprocess_yolov8(det_outputs, (original_h, original_w), preprocess_info) print(f"最终检测到 {len(boxes)} 个目标。") # 5. 绘制结果 if len(boxes) > 0: result_image = draw_detections(original_image.copy(), boxes, scores, class_ids) # 保存结果图像 cv2.imwrite('out.jpg', result_image) print(f"结果图像已保存至: out.jpg") else: print("未检测到任何目标。") # 保存原始图像用于调试 cv2.imwrite('out.jpg', original_image) ``` 新建一个终端，进入tritonserver容器，使用命令行运行python client.py --input dog.jpg 程序，如下输出结果，并且生成一个out.jpg的结果图。 Creating buffer from image file... 正在进行图像预处理... 原始图像尺寸: 1024x683 预处理信息 - 缩放因子: 0.625, 填充: (0, 107) input shape:(1, 3, 640, 640) Invoking inference... Done Received result buffer "output0" of size (1, 84, 8400) Naive buffer sum: 6695313.5 正在进行后处理... 最终检测到 1 个目标。检测到 dog: 0.945 坐标范围: x1:113, y1:130, x2:917, y2:589 结果图像已保存至: out.jpg ![输入图片说明](image/out1image.png) 至此，完成在昇腾环境上使用tritonserver部署yolov8s的服务化推理。