# mt-dnn

**Repository Path**: hy_song/mt-dnn

## Basic Information

- **Project Name**: mt-dnn
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-08-28
- **Last Updated**: 2025-08-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# mt-dnn

**mt-dnn** is a runtime system that runs multiple DNN queries simultaneously with stable and predictable latency. **mt-dnn** enables deterministic operator overlap to enforce the latency predictability. **mt-dnn** is comprised of an overlap-aware latency predictor, a headroom-based query controller, and segmental model executors. The latency predictor is able to precisely predict the latencies of queries when the operator overlap is determined. The query controller determines the appropriate operator overlap to guarantee the QoS of all the DNN services on a GPU. The model executors run the operators as needed to support the deterministic operator overlap. Our evaluation using seven popular DNNs on an Nvidia A100 GPU shows that **mt-dnn** significantly reduces the QoS violation and improves the throughput compared with state-of-the-art solutions.

## Environment Preparation

- Hardware&software requirements

  1. Hardware Requirements

     1. CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
     2. Memroy: 252G
     3. NVIDIA Ampere 100

  2. Software Requirements

     1. Ubuntu 20.04.1 (Kernel 5.8.0)
     2. GPU Driver: 460.39
     3. CUDA 11.2
     4. CUDNN 8.1
     5. Anaconda3-2020.7
     6. Pytorch 1.8.0

- Preparing Python environment

  1. Install Anaconda3 as the Python runtime

  ```shell
  $ cd ~ && mkdir -p .local && cd .local
  $ wget -O Anaconda3-2020.07-Linux-x86_64.sh https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
  $ chmod +x Anaconda3-2020.07-Linux-x86_64.sh
  $ ./Anaconda3-2020.07-Linux-x86_64.sh -b -p ../.local/anaconda3
  ```

  2. Activate conda and create python environment with essential dependencies

  ```shell
  $ eval "$($HOME/.local/anaconda3/bin/conda shell.zsh hook)"
  $ # cd into mt-dnn repository
  $ conda create --name mt-dnn
  $ pip3 install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
  $ pip3 install -r requirements.txt
  ```

## Getting Started

The following sections step through the things required to run **mt-dnn**

### Profiling

**mt-dnn** needs to profile the essential data for training a precise overlap-aware latency predictor.

We first profiling the data without MPS enabled and MIG disabled.

- Profiling for pair-wise co-location on a dedicated A100.
  ```shell
  $ python main.py --task profile --model_num 2 --platform single --test 100 --gpu A100 --device 0
  ```
- Profiling for triplet-wise co-location on a dedicated A100.
  ```shell
  $ python main.py --task profile --model_num 3 --platform single --test 100 --gpu A100 --device 0
  ```
- Profiling for quadruplet-wise co-location on a dedicated A100.
  ```shell
  $ python main.py --task profile --model_num 4 --platform single --test 100 --gpu A100 --device 0
  ```

### Training Predictor

After obataining all the profiling data, we train the latency predictor for each cases.

#### Training MLP model

  - Training the predictor for pair-wise co-location on a dedicated A100 for each combination.
    ```shell
    $ python main.py --task train --model_num 2 --mode onebyone --platform single --modeling mlp --gpu A100 --device 0 --predictor operator
    ```
  - Training the predictor for pair-wise co-location on a dedicated A100 for all combinations.
    ```shell
    $ python main.py --task train --model_num 2 --mode all --platform single --modeling mlp --gpu A100 --device 0 --predictor operator
    ```
  - Training the predictor for triplet-wise co-location on a dedicated A100 for all combinations.
    ```shell
    $ python main.py --task train --model_num 3 --mode all --platform single --modeling mlp --gpu A100 --device 0 --predictor operator
    ```
  - Training the predictor for quadruplet-wise co-location on a dedicated A100 for all combinations.
    ```shell
    $ python main.py --task train --model_num 4 --mode all --platform single --modeling mlp --gpu A100 --device 0 --predictor operator
    ```

#### Training LR/SVM model

  - Training the predictor for pair-wise co-location on a dedicated A100 for all combinations.
    ```shell
    $ python main.py --task train --model_num 2 --mode all --platform single --modeling lr/svm --gpu A100 --device 0 --predictor operator
    ```
  - Training the predictor for pair-wise co-location on a dedicated A100 for each combination.

    ```shell
    $ python main.py --task train --model_num 2 --mode onebyone --platform single --modeling lr/svm --gpu A100 --device 0 --predictor operator
    ```

### Online Serving

After profiling and training, we can serve multiple DNN services with **mt-dnn**

- Testing **mt-dnn** for pair-wise co-location on a dedicated A100, take the setup of `model id: 0, 1; qos_target: 50ms; total tested queries: 1000; search ways: 2` as an example.
  ```shell
  $ python main.py --task server --model_num 2 --comb 0 1 --policy mt-dnn --load 50 --qos 50 --queries 1000 --thld 5 --ways 2 --predictor layer --gpu A100 --device 0 --node 0 --platform single
  ```
- Testing **mt-dnn** for triplet-wise co-location on a dedicated A100, take the setup of `model id: 0, 1, 3; qos_target: 50ms; total tested queries: 1000; search ways: 2` as an example.
  ```shell
  $ python main.py --task server --model_num 2 --comb 0 1 2 --policy mt-dnn --load 50 --qos 50 --queries 1000 --thld 5 --ways 2 --predictor layer --gpu A100 --device 0 --node 0 --platform single
  ```
- Testing **mt-dnn** for quadruplet-wise co-location on a dedicated A100, take the setup of `model id: 0, 1, 2, 3; qos_target: 50ms; total tested queries: 1000; search ways: 2` as an example.
  ```shell
  $ python main.py --task server --model_num 2 --comb 0 1 2 3 --policy mt-dnn --load 50 --qos 50 --queries 1000 --thld 5 --ways 2 --predictor layer --gpu A100 --device 0 --node 0 --platform single
  ```