# memload **Repository Path**: qiming-007/memload ## Basic Information - **Project Name**: memload - **Description**: A lightweight toolset to generate controlled memory bandwidth load for performance experiments - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2025-10-29 - **Last Updated**: 2025-11-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # memload + spawn scripts A lightweight toolset to generate controlled **memory bandwidth load** for performance experiments. ## Overview - **memload**: A single-process C program that copies memory chunks at a configurable rate (`-r`%) to achieve a target memory bandwidth usage. - **spawn_memload.sh**: Launches multiple `memload` instances on selected NUMA nodes and CPUs using `numactl`. ## memload ### Description - Allocates two memory buffers (`src` and `dst`) and performs one `memcpy` per chunk. - Automatically measures the average time `t` for one chunk copy unless `-t` is provided. - Adjusts load according to the target bandwidth ratio `r` by sleeping `s = t * (1 - r) / r` after each copy. - Reports per-window and total throughput (`win_MB/s`, `total_MB/s`) every 2 seconds. ### Build ```bash gcc -O3 -march=native -o memload memload.c ```` ### Usage ```bash ./memload -size 1G -chunk 32M -r 50 -duration 60 ``` **Parameters:** | Option | Description | | ------------------ | ----------------------------------------------------- | | `-size` | Total buffer size (e.g., 1G, 512M) | | `-chunk` | Chunk size per memcpy iteration | | `-r` | Target memory bandwidth ratio (1–100) | | `-duration` | Run time in seconds (`0` = infinite) | | `-t` | (Optional) Per-chunk copy time in seconds | | `-measure-repeats` | (Optional) Repeat count for measuring `t` (default 5) | ### Output Example ``` Measuring chunk memcpy time (chunk=33554432 bytes, repeats=5)... Measured t = 0.025816824 s size=1073741824 chunk=33554432 r=50.000% t=0.025816824 s=0.025816824 duration=60.0 t=2.1s window=2.1s win_MB/s=955.0 total_MB/s=955.0 ... t=59.9s window=2.1s win_MB/s=987.0 total_MB/s=991.6 ``` --- ## spawn_memload.sh ### Description * Spawns multiple `memload` instances but **blocks** until all child processes finish. * After completion, parses logs to report per-instance and aggregate throughput. ### Usage ```bash ./spawn_memload.sh -numa 0 -- ./memload -size 1G -chunk 32M -r 50 -duration 60 ``` ### Example Output ``` PID CPU NODE RUNTIME(s) AVG_MB/s TRANSFER_MB LOGFILE 12345 0 0 60.1 990.5 59430.0 ./logs/memload_cpu0.log ... Total transferred: 475.2 GB Wall time: 60.1 s Aggregate throughput: 7920.4 MB/s Instances: 8 ``` --- ## Example ```bash # Compile gcc -O3 -march=native -o memload memload.c # Single-instance test numactl --physcpubind=0 --membind=0 -- ./memload -size 1G -chunk 32M -r 50 -duration 10 # Multi-core test ./spawn_memload.sh -numa 0 -- ./memload -size 1G -chunk 32M -r 50 -duration 60 ``` --- ## License MIT License © 2025