# apm-transport-stress-tests

**Repository Path**: mirrors_DataDog/apm-transport-stress-tests

## Basic Information

- **Project Name**: apm-transport-stress-tests
- **Description**: Stress and chaos testing the tracer UDS implementations
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-06-24
- **Last Updated**: 2026-04-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# APM Transport Stress Tests

Stress and chaos testing the tracer transports.

Refer to https://github.com/DataDog/apm-transport-stress-tests/blob/main/example-use.sh for the latest way to run this in your CI or as a job external to this repository.

## Basic usage

There are options available to customize the runs:
 - export CONCURRENT_SPAMMERS=$number_of_extra_spammer_samples_to_run
 - export TRANSPORT_STRESS_TIMEOUT_MS=$approximate_milliseconds_to_run_samples_for

```mermaid
flowchart TD
    APIKEY[export DD_API_KEY=$api_key] --> LANG
    LANG[export language=$folder_under_languages] --> BUILD
    BUILD[./build.sh $language realagent] -->|wait| RUNTCP
    BUILD -->|wait| RUNUDS
    RUNTCP[./run.sh tcpip] -->|wait| FINISH
    RUNUDS[./run.sh uds] -->|wait| FINISH
    FINISH[Tests complete] --> LOGS
    FINISH[Tests complete] --> DASHBOARD
    DASHBOARD[Metrics tagged env:transport-tests-$hostname]
    LOGS[Logs in ./results]
```

## Architecture

To explain the architecture in a readable way, the charts are split into several focus areas.

This tool is designed to emulate APM load against a shared agent in a k8s cluster, without the additional noise of web frameworks and automatic instrumentation.
The purpose of this tool is to smoke test the overhead of the UDS transport against the TCPIP transport in highly concurrent scenarios.

The first goal is to not negatively affect the application with UDS compared to the well worn path of TCPIP.
The secondary goal is to understand potential span/data loss per language and transport.

The worst result would be application crashing behavior. 
We guard against this by watching for any non-graceful exits and failing the `./run.sh` script if they are detected.
If there is a graceful exit, we can expect that the metrics reported are reliable.

### Spammers and the Agent

Each language has a `spammer` application defined in the corresponding language folder (`./languages`).

The spammer application is ultimately a script that runs until `SIGINT` is received.
Approximately every millisecond, a trace is created with that tracer's manual instrumentation API.
The tracer is configured with the variables needed to enable either UDS or TCPIP depending on the ./run.sh parameters.

There is a single `spammer` container which is measured and the logs are saved to a shared volume.
There are many `concurrent-spammer` containers which are responsible for generating load for that language against the Agent.

Both the `spammer` and `concurrent-spammer` use the same image defined by the language's Dockerfile.

The numbers of `concurrent-spammer` applications can be increased until the socket the agent is listening on is overloaded.

```mermaid
flowchart TD
    AGENT[Agent]
    SPAMMER[Spammer] -->|traces| AGENT
    CONCURRENTSPAMMER1[Concurrent Spammer 1] -->|traces| AGENT
    CONCURRENTSPAMMER2[Concurrent Spammer 2] -->|traces| AGENT
    CONCURRENTSPAMMERX[Concurrent Spammer ...] -->|traces| AGENT
```


# Observer and Container Metrics

The `observer` is standalone as it's only responsibility is to collect information about the test runs, and it is not the target of attempted overload. 
The `observer` image is ultimately the datadog agent, and collects container metrics from the `spammer` and `agent` for comparison across languages, transports, and concurrency profiles.
The `observer` is also the destination for any custom metrics from the `spammer` applications.

```mermaid
flowchart TD
    OBSERVER[Observer]
    OBSERVER -->|container-metrics| SPAMMER
    OBSERVER -->|container-metrics| AGENT
    SPAMMER[Spammer]
    AGENT[Agent]
```


# Spammers and Custom Metrics

The `spammer` image is the source of truth for how many spans are created.
In order to understand throughput of spans, the spammer image self reports spans created to the `observer` container.
This allows us to compare the number of spans sent to the number of spans received for each language to understand how transport type (UDS/TCPIP) and concurrency level contribute to data loss.
Obviously, the first priority is to not negatively affect applications if communication problems occur, but this gives us data on the secondary goal of preventing or limiting data loss.

This requires that each language's `spammer` application submit metrics in a reliable and consistent way.

```mermaid
flowchart TD
    OBSERVER[Observer]
    SPAMMER -->|metrics| OBSERVER
    CONCURRENTSPAMMER1[Concurrent Spammer 1]-->|metrics| OBSERVER
    CONCURRENTSPAMMER2[Concurrent Spammer 2] -->|metrics| OBSERVER
    CONCURRENTSPAMMERX[Concurrent Spammer ...] -->|metrics| OBSERVER
```