# ServeGen

**Repository Path**: alibaba/ServeGen

## Basic Information

- **Project Name**: ServeGen
- **Description**: A framework for generating realistic LLM serving workloads
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-06
- **Last Updated**: 2025-12-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ServeGen

ServeGen is a framework for generating realistic large language model (LLM) serving workloads.
Powered by the analysis of billions of inference requests across 12 production models on Alibaba Cloud Model Studio ([百炼](https://www.aliyun.com/product/bailian)), ServeGen is able to replicate the nuanced complexity of real-world workloads, such as:

+ **Bursty** request arrivals beyond simple Poisson models
+ **Shifting** input/output length distributions over days and weeks
+ **Heterogeneous** data composition in multimodal workloads (Qwen-VL)
+ **Bimodal** reasoning length distribution in reasoning workloads (DeepSeek-R1)

We hope ServeGen can become a data-driven bridge between frontier research and production realities when designing and deploying new LLM serving systems.

For more detailed analysis results, check out our characterization [paper](https://www.arxiv.org/abs/2505.09999)!


## Requirements

ServeGen requires Python 3.8 or higher and the following dependencies:

- **numpy** (>=1.20.0): For numerical computations and array operations
- **scipy** (>=1.7.0): For statistical distributions and sampling
- **pytest** (>=7.0.0): For running tests (optional, only needed for development)

You can install all dependencies and this project using pip:

```bash
pip install -r requirements.txt
pip install -e .
```

## Examples

### Basic Usage

```python
from servegen import Category, ClientPool
from servegen.construct import generate_workload

# Load client data
pool = ClientPool(Category.LANGUAGE, "m-large")

# Generate workload
rate_fn = {0: 100.0, 600: 150.0}  # requests per second
requests = generate_workload(pool, rate_fn, duration=1200)
```

### Custom Workloads

```python
# Create custom clients with different patterns
bursty_client = create_bursty_client(1)  # High CV, concentrated distributions
stable_client = create_stable_client(2)  # Low CV, Pareto/Exponential distributions

# Generate workload with custom rate function
rate_fn = {0: 10.0, 60: 15.0, 120: 8.0}
requests = generate_workload(pool, rate_fn, duration=180)
```

### Multimodal and Reasoning Workloads

```python
# Generate multimodal workload
pool = ClientPool(Category.MULTIMODAL, "mm-image")
requests = generate_workload(pool, rate_fn, duration=600)

# Generate reasoning workload
pool = ClientPool(Category.REASON, "deepseek-r1")
requests = generate_workload(pool, rate_fn, duration=3600)
```

See `examples/` for more detailed examples:
- `basic_usage.py`: Basic workload generation and saving to CSV
- `generate_custom.py`: Custom workload patterns
- `generate_realistic.py`: Realistic workload generation
- `generate_advanced.py`: Multimodal and reasoning workloads
- `clientpool_example.py`: Client pool analysis and filtering

### Filtering Client Data and Getting CDFs

```python
from servegen import Category, ClientPool
import numpy as np

# Load client pool
pool = ClientPool(Category.LANGUAGE, "m-large")

# Filter clients by various criteria
filtered_view = (
    pool
    .span(72000, 75600)  # 20:00-21:00
    .filter_by_cv(0.5, 1.5)  # Filter by coefficient of variation
    .filter_by_avg_input_len(100, 1000)  # Filter by average input length
    .filter_by_max_output_len(2000)  # Filter by maximum output length
)

# Get CDFs of client behaviors
cdfs = filtered_view.get_cdfs()

# Print information about available CDFs
print("\nAvailable CDFs:")
for field in cdfs:
    if field in ["rate", "cv"]:
        timestamps = sorted(cdfs[field].keys())
        print(f"  {field}: {len(timestamps)} timestamps")
    else:
        stats = cdfs[field].keys()
        print(f"  {field}: {len(stats)} statistics")

# Print detailed information for the first timestamp
first_ts = min(cdfs["rate"].keys())
values, probs = cdfs["rate"][first_ts]
print(f"\nRate CDF at timestamp {first_ts}:")
print(f"  Values: {values}")
print(f"  Probabilities: {probs}")

# Print statistics for input tokens
print("\nInput token statistics:")
for stat in ["avg", "p50", "p95", "p99"]:
    if stat in cdfs["input_tokens"] and first_ts in cdfs["input_tokens"][stat]:
        values, probs = cdfs["input_tokens"][stat][first_ts]
        print(f"  {stat.upper()}:")
        print(f"    Values: {values}")
        print(f"    Probabilities: {probs}")
```

## Data Structure

The framework comes with data organized as follows:

```
data/
├── language/
│   ├── m-large/
│   │   ├── chunk-1-dataset.json
│   │   ├── chunk-1-trace.csv
│   │   ├── chunk-2-dataset.json
│   │   └── chunk-2-trace.csv
│   ├── m-mid/
│   │   └── ...
│   ├── m-small/
│   │   └── ...
├── reason/
│   ├── deepseek-r1/
│   │   ├── chunk-1-dataset.json
│   │   └── chunk-1-trace.csv
└── multimodal/
    ├── mm-image/
    │   ├── chunk-1-dataset.json
    │   └── chunk-1-trace.csv
```

Each category (LANGUAGE, REASON, MULTIMODAL) contains model-specific data with:
- `chunk-i-dataset.json`: Request data distributions
- `chunk-i-trace.csv`: Rate and arrival pattern information


## Citation

If you find our work helpful, feel free to give us a cite.

```txt
@misc{servegen,
    title={ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production}, 
    author={Yuxing Xiang and Xue Li and Kun Qian and Wenyuan Yu and Ennan Zhai and Xin Jin},
    year={2025},
    eprint={2505.09999},
    archivePrefix={arXiv},
    primaryClass={cs.DC},
    url={https://arxiv.org/abs/2505.09999}, 
}
```