# memload

**Repository Path**: qiming-007/memload

## Basic Information

- **Project Name**: memload
- **Description**: A lightweight toolset to generate controlled memory bandwidth load
for performance experiments
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2025-10-29
- **Last Updated**: 2025-11-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# memload + spawn scripts
A lightweight toolset to generate controlled **memory bandwidth load**
for performance experiments.

## Overview

- **memload**: A single-process C program that copies memory chunks at a
  configurable rate (`-r`%) to achieve a target memory bandwidth usage.
- **spawn_memload.sh**: Launches multiple `memload` instances on selected
  NUMA nodes and CPUs using `numactl`.


## memload

### Description

- Allocates two memory buffers (`src` and `dst`) and performs one
  `memcpy` per chunk.
- Automatically measures the average time `t` for one chunk copy unless
  `-t` is provided.
- Adjusts load according to the target bandwidth ratio `r` by sleeping
  `s = t * (1 - r) / r` after each copy.
- Reports per-window and total throughput (`win_MB/s`, `total_MB/s`)
  every 2 seconds.

### Build

```bash
gcc -O3 -march=native -o memload memload.c
````

### Usage

```bash
./memload -size 1G -chunk 32M -r 50 -duration 60
```

**Parameters:**

| Option             | Description                                           |
| ------------------ | ----------------------------------------------------- |
| `-size`            | Total buffer size (e.g., 1G, 512M)                    |
| `-chunk`           | Chunk size per memcpy iteration                       |
| `-r`               | Target memory bandwidth ratio (1–100)                 |
| `-duration`        | Run time in seconds (`0` = infinite)                  |
| `-t`               | (Optional) Per-chunk copy time in seconds             |
| `-measure-repeats` | (Optional) Repeat count for measuring `t` (default 5) |

### Output Example

```
Measuring chunk memcpy time (chunk=33554432 bytes, repeats=5)...
Measured t = 0.025816824 s
size=1073741824 chunk=33554432 r=50.000% t=0.025816824 s=0.025816824 duration=60.0
t=2.1s window=2.1s win_MB/s=955.0 total_MB/s=955.0
...
t=59.9s window=2.1s win_MB/s=987.0 total_MB/s=991.6
```

---


## spawn_memload.sh

### Description

* Spawns multiple `memload` instances but **blocks** until all child processes finish.
* After completion, parses logs to report per-instance and aggregate
  throughput.

### Usage

```bash
./spawn_memload.sh -numa 0 -- ./memload -size 1G -chunk 32M -r 50 -duration 60
```

### Example Output

```
PID     CPU NODE RUNTIME(s) AVG_MB/s TRANSFER_MB LOGFILE
12345   0   0    60.1       990.5    59430.0     ./logs/memload_cpu0.log
...
Total transferred: 475.2 GB
Wall time: 60.1 s
Aggregate throughput: 7920.4 MB/s
Instances: 8
```

---


## Example

```bash
# Compile
gcc -O3 -march=native -o memload memload.c

# Single-instance test
numactl --physcpubind=0 --membind=0 -- ./memload -size 1G -chunk 32M -r 50 -duration 10

# Multi-core test
./spawn_memload.sh -numa 0 -- ./memload -size 1G -chunk 32M -r 50 -duration 60
```

---

## License

MIT License © 2025