# scaling
**Repository Path**: lifang535/scaling
## Basic Information
- **Project Name**: scaling
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2024-01-22
- **Last Updated**: 2024-05-24
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# This is a pipeline scaling project code.
The correct code was stored in **TJU-NSL/pipeline_scaling_lifang535.**
Mainly including **Client, Frontend, Worker, and Scheduler.**
Email: 2474061008@qq.com
## Introduction
- This is a simulation code for a model inference pipeline, which mainly contains Client, Frontend, Worker and Scheduler.
- The Client provides a dynamical workload, and recycles the requests.
- The Frontend receives requests, combines them into batches and sends them to Workers.
- The Scheduler calculates the required batch size and number of Workers at different stages of the pipeline based on the request load, adjusting dynamically.
Here comes the details.
### Client
- **Client**
- Reads a .csv file and provides a certain workload.
- Collects requests and calculates the percentage of requests that meet the SLO and are completed.
- **Receiver**
- `rpc SendRequest (Request) returns (Response) {}`
- Receives requests completed and returned by the last Worker, or dropped by the Frontend.
### Frontend
- **Frontend**
- Combines received requests into batches and sends them to Workers.
- Drops requests that will definitely time out.
- Uses the scheduler to determine the number of Workers needed, sends orders to Workers to control their status, and reads feedback from Workers via `worker_control` to know which Workers are active or inactive.
- **Receiver**
- `rpc SendRequest (Request) returns (Response) {}`
- Receives requests sent by the Client.
- `rpc SendFeedback (Feedback) returns (FeedbackResponse) {}`
- Receives feedback from Workers (used to control Worker status).
- `rpc SendQueueInformation (QueueInformation) returns (QueueInformationResponse) {}`
- Receives queue information from Workers.
- `rpc SendBatchSize (BatchSize) returns (BatchSizeResponse) {}`
- Receives batch size scheduling information from the Scheduler.
### Worker
- **Worker**
- Processes the received batch.
- Sends processed requests in the batch to the next level Frontend sequentially.
- Synchronizes queue information with the Frontend.
- **Receiver**
- `rpc SendBatch (Batch) returns (BatchResponse) {}`
- Receives batches sent by the Frontend.
- `rpc SendOrder (Order) returns (OrderResponse) {}`
- Receives orders from the Frontend to control Worker status.
### Scheduler
- **Scheduler**
- Receives information from the Frontend.
- Calculates the batch size allocated to each Frontend.
- **Receiver**
- `rpc SendFrontendInformation (FrontendInformation) returns (FrontendInformationResponse) {}`
- Receives information from the Frontend, including current input rate and number of Workers.
## Install
```
$ pip install -r requirements.txt
```
## Quick start
```
$ cd modules
$ python pipeline.py
```
This will create the pipeline and start the Client to send requests.
During the experiment, we can monitor request completion using TensorBoardX as follows.
```
$ tensorboard --logdir=modules/logs/tensorboardx --port=6006
```
Then, open the browser to http://localhost:6006/ (or another forwarded port).