# mybench

**Repository Path**: mirrors_Shopify/mybench

## Basic Information

- **Project Name**: mybench
- **Description**: A high-performance framework for rapid prototyping of database benchmarks
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-11-16
- **Last Updated**: 2026-01-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

`mybench`
=========

`mybench` is a benchmark authoring library that helps you create your own database benchmark with Golang. The central features of mybench includes:

- A library approach to database benchmarking
- Discretized precise rate control: the rate at which the events run is discretized to a relatively low frequency (default: 50hz), as Linux + Golang cannot reliably maintain 100~1000Hz. The number of events run on each iteration is determined by sampling an uniform or Poisson distribution. The rate control is very precise and have been achieved standard deviations of <0.2% of the desired rate.
- Ability to parallelize a single workload into multiple goroutines, each with its own connection.
- Ability to run multiple workloads simultaneously with data being logged from all workloads.
- Uses [HDR Histogram](https://github.com/HdrHistogram/hdrhistogram-go) to keep track of latency online.
- Web UI for live monitoring throughput and latency of the current benchmark.
- A simple interface for implementing the data loader (which creates the tables and seed it with data) and the benchmark driver.
- A number of built-in data generators, including thread-safe auto incrementing generators.
- Command line wrapper: A wrapper library to help build command line apps for the benchmark.

Design
------

For more details, see [the design doc](https://shopify.github.io/mybench/detailed-design-doc.html). Some of the information in this section may eventually move there.

There are a few important structs defined in this library, and they are:

- `Benchmark`: The main "entrypoint" to running a benchmark. This keeps track of multiple `Workload`s and performs data aggregation across all the `Workload`s and their `BenchmarkWorker`s.
- `WorkloadInterface`: an interface that is defined by the end-user who want to create a benchmark. Notably, the end-user will implement an `Event()` function that should be called at a some specified `EventRate` (concurrently with a number of goroutines).
- `Workload`: Responsible for creating and running the workers (goroutines) to call the `Event()` function of the `WorkloadInterface`.
- `BenchmarkWorker`: Responsible for setting up the `Looper` and keeping track of the worker-local statistics (such as the event latency/histograms for the local goroutine).
- `Looper`: Responsible for discretizing the desired event rate into something that's achievable on Linux. Actually calls the `Event()` function. It can also perform complex discretization such as Poisson-distribution based event sampling.
- `BenchmarkDataLoader`: A data loader helper that helps you easily concurrently load data by specifying only a few options, such as the number of rows and the type of data generator for each columns.
- `BenchmarkApp[T]`: A wrapper to help create a command line app for a benchmark.
- `Table`: An object that helps you create the database and track a default set of data generators.

### Data collection and flow

The benchmark system mainly collects data about the throughput and latency of the `Event()` function call, which contains custom logic (usually MySQL calls). Since `Event()` can be called from a large number of `BenchmarkWorker`s, each `BenchmarkWorker` collects its own statistics for performance reasons. The data collected by the `BenchmarkWorker`s are:

- The count and rate of `Event()`
- The latency distribution of `Event()` as tracked via the [HDR Histogram](https://github.com/HdrHistogram/hdrhistogram-go).
- Unimplemented:
  - How long the worker spent in "saturation" (i.e. `Event()` is slower than the requested event rate). This is probably an important metric for later.
  - The amount of time spent sleeping (could be useful to debug saturation problem in case the looper is incorrectly implemented).
  - Everything in `OuterLoopStat`: wakeup latency, event batch size. This is probably less important than the above.

Having all this data in hundreds of independent Goroutines (`BenchmarkWorkers`) is not particularly useful. The data must be aggregated. This data aggregation is done on the workload level by the `Workload`, which is then aggregated at the `Benchmark` level via the data logger. This description may make it sound like the data collection is initiated by the `BenchmarkWorker`s -- it is not. Instead, every few seconds, the data logger calls the appropriate functions to aggregate data. During data collection, a lock taken for each `BenchmarkWorker`, which allows for the safe reading of data. This is fine as each `BenchmarkWorker` has its own mutex and there's never a lot of contention. If this becomes a problem, lockless programming may be a better approach.

Run a benchmark
---------------

- Shopify orders benchmark: `make examplebench && build/examplebench -host mysql-1 -user sys.admin_rw -pass hunter2 -bench -eventrate 3000`
  - Change the host
  - Change the event rate. The command above specifies 3000 events/s.
- Go to https://localhost:8005 to see the monitoring web UI.

Write your own benchmark
------------------------

See [benchmarks](./benchmarks) for examples and read the [docs](https://shopify.github.io/mybench/).