# scaling **Repository Path**: lifang535/scaling ## Basic Information - **Project Name**: scaling - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2024-01-22 - **Last Updated**: 2024-05-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # This is a pipeline scaling project code. The correct code was stored in **TJU-NSL/pipeline_scaling_lifang535.** Mainly including **Client, Frontend, Worker, and Scheduler.** Email: 2474061008@qq.com Images

## Introduction - This is a simulation code for a model inference pipeline, which mainly contains Client, Frontend, Worker and Scheduler. - The Client provides a dynamical workload, and recycles the requests. - The Frontend receives requests, combines them into batches and sends them to Workers. - The Scheduler calculates the required batch size and number of Workers at different stages of the pipeline based on the request load, adjusting dynamically. Here comes the details. ### Client - **Client** - Reads a .csv file and provides a certain workload. - Collects requests and calculates the percentage of requests that meet the SLO and are completed. - **Receiver** - `rpc SendRequest (Request) returns (Response) {}` - Receives requests completed and returned by the last Worker, or dropped by the Frontend. ### Frontend - **Frontend** - Combines received requests into batches and sends them to Workers. - Drops requests that will definitely time out. - Uses the scheduler to determine the number of Workers needed, sends orders to Workers to control their status, and reads feedback from Workers via `worker_control` to know which Workers are active or inactive. - **Receiver** - `rpc SendRequest (Request) returns (Response) {}` - Receives requests sent by the Client. - `rpc SendFeedback (Feedback) returns (FeedbackResponse) {}` - Receives feedback from Workers (used to control Worker status). - `rpc SendQueueInformation (QueueInformation) returns (QueueInformationResponse) {}` - Receives queue information from Workers. - `rpc SendBatchSize (BatchSize) returns (BatchSizeResponse) {}` - Receives batch size scheduling information from the Scheduler. ### Worker - **Worker** - Processes the received batch. - Sends processed requests in the batch to the next level Frontend sequentially. - Synchronizes queue information with the Frontend. - **Receiver** - `rpc SendBatch (Batch) returns (BatchResponse) {}` - Receives batches sent by the Frontend. - `rpc SendOrder (Order) returns (OrderResponse) {}` - Receives orders from the Frontend to control Worker status. ### Scheduler - **Scheduler** - Receives information from the Frontend. - Calculates the batch size allocated to each Frontend. - **Receiver** - `rpc SendFrontendInformation (FrontendInformation) returns (FrontendInformationResponse) {}` - Receives information from the Frontend, including current input rate and number of Workers. ## Install ``` $ pip install -r requirements.txt ``` ## Quick start ``` $ cd modules $ python pipeline.py ``` This will create the pipeline and start the Client to send requests. During the experiment, we can monitor request completion using TensorBoardX as follows. ``` $ tensorboard --logdir=modules/logs/tensorboardx --port=6006 ``` Then, open the browser to http://localhost:6006/ (or another forwarded port).