Linux
Ascend
Serving
Intermediate
Expert
MindSpore allows a model to generate multiple subgraphs and scheduling these subgraphs can improve the performance. For example, graphs corresponding to two phases are split in the GPT3 scenario. The initialization graph is at the first phase, which needs to be executed only once. The inference graph at the second phase needs to be executed multiple times based on the input sentence length N. Before the optimization, the two graphs were combined N times. Now, the performance of the inference service is improved by 5 to 6 times. MindSpore Serving provides the pineline function to schedule multiple graphs, improving the inference service performance in specific scenarios.
Currently, the pipeline has the following restrictions:
The following uses a distributed scenario as an example to describe the pipeline deployment process.
Before running the sample network, ensure that MindSpore Serving has been properly installed and the environment variables are configured. To install and configure MindSpore Serving on your PC, go to the MindSpore Serving installation page.
For details about the files required for exporting a distributed model, see export_model directory. The following files are required:
export_model
├── distributed_inference.py
├── export_model.sh
├── net.py
└── rank_table_8pcs.json
net.py
is the definition of the MatMul network.distributed_inference.py
is used to configure distributed parameters.export_model.sh
creates the device
directory on the current machine and exports the model file corresponding to each device
.rank_table_8pcs.json
is the JSON file for configuring the networking information of the current multi-device environment. For details, see rank_table.Use net.py to build a network that contains MatMul and Neg operators.
import numpy as np
from mindspore import Tensor, Parameter, ops
from mindspore.nn import Cell
class Net(Cell):
def __init__(self, matmul_size, init_val, transpose_a=False, transpose_b=False, strategy=None):
super().__init__()
matmul_np = np.full(matmul_size, init_val, dtype=np.float32)
self.matmul_weight = Parameter(Tensor(matmul_np))
self.matmul = ops.MatMul(transpose_a=transpose_a, transpose_b=transpose_b)
self.neg = ops.Neg()
if strategy is not None:
self.matmul.shard(strategy)
def construct(self, inputs):
x = self.matmul(inputs, self.matmul_weight)
x = self.neg(x)
return x
Use distributed_inference.py to generate a multi-graph model. For details, see Distributed Inference.
import numpy as np
from net import Net
from mindspore import context, Model, Tensor, export
from mindspore.communication import init
def test_inference():
"""distributed inference after distributed training"""
context.set_context(mode=context.GRAPH_MODE)
init(backend_name="hccl")
context.set_auto_parallel_context(full_batch=True, parallel_mode="semi_auto_parallel",
device_num=8, group_ckpt_save_file="./group_config.pb")
predict_data = create_predict_data()
network = Net(matmul_size=(96, 16), init_val = 0.5)
model = Model(network)
model.infer_predict_layout(Tensor(predict_data))
export(model._predict_network, Tensor(predict_data), file_name="matmul_0", file_format="MINDIR")
network_1 = Net(matmul_size=(96, 16), init_val = 1.5)
model_1 = Model(network)
model_1.infer_predict_layout(Tensor(predict_data))
export(model_1._predict_network, Tensor(predict_data), file_name="matmul_1", file_format="MINDIR")
def create_predict_data():
"""user-defined predict data"""
inputs_np = np.random.randn(128, 96).astype(np.float32)
return Tensor(inputs_np)
Use export_model.sh to export a multi-graph model. After the script is executed successfully, the model
directory is created in the upper-level directory. The structure is as follows:
model
├── device0
│ ├── group_config.pb
│ └── matmul.mindir
├── device1
├── device2
├── device3
├── device4
├── device5
├── device6
└── device7
Each device
directory contains two files group_config.pb
(the model group configuration file) and matmul_0.mindir
or matmul_1.mindir
(the model files corresponding to the two graphs, respectively).
Start the distributed inference service. For details, see pipeline_distributed. The following files are required:
matmul_distributed
├── serving_agent.py
├── serving_server.py
├── matmul
│ └── servable_config.py
├── model
└── rank_table_8pcs.json
model
is the directory for storing model files.serving_server.py
is used to start service processes, including the Main
and Distributed Worker
processes.serving_agent.py
is used to start the Agent
.servable_config.py
is the model configuration file. It uses distributed.declare_servable
to declare a distributed model whose rank_size is 8 and stage_size is 1, and defines a pipeline method predict
.Content of the configuration file:
import numpy as np
from mindspore_serving.server import distributed
from mindspore_serving.server import register
from mindspore_serving.server.register import PipelineServable
distributed.declare_servable(rank_size=8, stage_size=1, with_batch_dim=False)
def add_preprocess(x):
"""define preprocess, this example has one input and one output"""
x = np.add(x, x)
return x
@register.register_method(output_names=["y"])
def fun1(x):
x = register.call_preprocess(add_preprocess, x)
y = register.call_servable(x, subgraph=0)
return y
@register.register_method(output_names=["y"])
def fun2(x):
y = register.call_servable(x, subgraph=1)
return y
servable1 = PipelineServable(servable_name="matmul", method="fun1", version_number=0)
servable2 = PipelineServable(servable_name="matmul", method="fun2", version_number=0)
@register.register_pipeline(output_names=["x", "z"])
def predict(x, y):
x = servable1.run(x)
for i in range(10):
print(i)
z = servable2.run(y)
return x, z
The subgraph
parameter of the call_servable
method specifies the graph number, which starts from 0. The number is the sequence number for loading graphs. In a standalone system, the number corresponds to the sequence number in the servable_file
parameter list of the declare_servable
interface. In a distributed system, this number corresponds to the sequence number in the model_files
parameter list of the startup_agents
interface.
The PipelineServable
class declares the service function of the model, servable_name
specifies the model name, method
specifies the function method, version_number
specifies the version number, register_pipeline
registers the pipeline function, and the input parameter output_names
specifies the output list.
Use serving_server.py to call the distributed.start_servable
method to deploy the distributed Serving server.
import os
import sys
from mindspore_serving import server
from mindspore_serving.server import distributed
def start():
servable_dir = os.path.dirname(os.path.realpath(sys.argv[0]))
distributed.start_servable(servable_dir, "matmul",
rank_table_json_file="rank_table_8pcs.json",
version_number=1,
distributed_address="127.0.0.1:6200")
server.start_grpc_server("127.0.0.1:5500")
server.start_restful_server("127.0.0.1:1500")
if __name__ == "__main__":
start()
servable_dir
is the servable directory.servable_name
indicates the servable name, which corresponds to a directory for storing the model configuration file.rank_table_json_file
is the JSON file for configuring the network information in the multi-device environment.distributed_address
is the Distributed Worker
address.wait_agents_time_in_seconds
specifies the time limit for waiting for the completion of all Agent
registrations. The default value is 0, indicating that the system keeps waiting for the completion of all Agent
registrations.Use serving_agent.py to call the startup_agents
method to start the eight Agent
processes on the current machine. Agent
obtains rank_table from Distributed Worker
so that Agents
can communicate with each other using HCCL.
from mindspore_serving.server import distributed
def start_agents():
"""Start all the agents in current machine"""
model_files = []
group_configs = []
for i in range(8):
model_files.append([f"model/device{i}/matmul_0.mindir", f"model/device{i}/matmul_1.mindir"])
group_configs.append([f"model/device{i}/group_config.pb"])
distributed.startup_agents(distributed_address="127.0.0.1:6200", model_files=model_files,
group_config_files=group_configs)
if __name__ == '__main__':
start_agents()
distributed_address
is the Distributed Worker
address.model_files
is a list of model file paths. Inputting multiple model files indicates that multiple graphs are supported. The file transfer sequence number determines the graph number corresponding to the subgraph
parameter of the call_servable
method.group_config_files
is the list of model group configuration file paths.agent_start_port
is the start port occupied by the Agent
. The default value is 7000.agent_ip
is the IP address of the Agent
. The default value is None. By default, the IP address used for the communication between the Agent
and Distributed Worker
is obtained from the rank_table. If the IP address is unavailable, you need to set both agent_ip
and rank_start
.rank_start
is the start rank_id of the current machine. The default value is None.To access the inference service through gRPC, you need to specify the IP address and port number of the gRPC server on the client. Execute serving_client.py to call the predict
method of the MatMul distributed model. This method corresponds to the registered pipeline method and is used to perform inference.
import numpy as np
from mindspore_serving.client import Client
def run_matmul():
"""Run client of distributed matmul"""
client = Client("localhost:5500", "matmul", "predict")
instance = {"x": np.ones((128, 96), np.float32), "y": np.ones((128, 96), np.float32)}
result = client.infer(instance)
print("result:\n", result)
if __name__ == '__main__':
run_matmul()
If the following information is displayed, the Serving distributed inference service has correctly executed the multi-graph inference of the pipeline:
result:
[{'x': array([[-96., -96., -96., ..., -96., -96., -96.],
[-96., -96., -96., ..., -96., -96., -96.],
[-96., -96., -96., ..., -96., -96., -96.],
...,
[-96., -96., -96., ..., -96., -96., -96.],
[-96., -96., -96., ..., -96., -96., -96.],
[-96., -96., -96., ..., -96., -96., -96.]], dtype=float32), 'z': array([[-48., -48., -48., ..., -48., -48., -48.],
[-48., -48., -48., ..., -48., -48., -48.],
[-48., -48., -48., ..., -48., -48., -48.],
...,
[-48., -48., -48., ..., -48., -48., -48.],
[-48., -48., -48., ..., -48., -48., -48.],
[-48., -48., -48., ..., -48., -48., -48.]], }]
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。