# testing-genai-applications
**Repository Path**: mirrors_elastic/testing-genai-applications
## Basic Information
- **Project Name**: testing-genai-applications
- **Description**: Testing GenAI Applications Workshop
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-18
- **Last Updated**: 2026-04-25
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Testing GenAI Applications Workshop
This workshop walks you through testing Generative AI (GenAI) applications
using the OpenAI CLI, SDK and a compatible LLM provider.
Each exercise builds on the last, so do them in order! We'll start with the
OpenAI CLI, and how to inspect traffic to the OpenAI API using a proxy and how
to enable observability with OpenTelemetry. We'll then get to coding in Python,
and write our first test in consideration of LLM quirks like variability and
hallucinations. We'll get progressively more sophisticated with offline testing
before we complete with evals using the LLM-as-a-judge pattern.
Let's begin!
## Example Application Flow
Regardless of whether we use the OpenAI CLI or its Python SDK, our example
remains consistent. The user asks a simple question:
> Answer in up to 3 words: Which ocean contains Bouvet Island?
We configure `temperature=0` to ensure a consistent LLM response, ideally
"Atlantic Ocean" or a close variant. However, the LLM might offer a debatable
answer like "Southern Ocean" or even hallucinate, which adds an engaging twist
to the learning experience!
We’ll explore this example first with the OpenAI CLI and then with the Python
SDK. Since both tools interact with the OpenAI API in the same way, the flow
remains identical:
```mermaid
sequenceDiagram
participant User
participant Application
participant API as OpenAI API
User ->> Application: "Answer in up to 3 words: Which ocean contains Bouvet Island?"
Note over Application: OpenAI CLI or Python SDK
activate Application
Application ->> API: POST /chat/completions with {model, messages, temperature=0}
activate API
Note over API: Generates text with the model
API ->> Application: JSON response with completion text
deactivate API
Application ->> User: "Atlantic Ocean"
deactivate Application
```
## Exercises
This workshop is designed to run in order. Make sure you perform all
prerequisites before starting the exercises.
1. [Use the OpenAI CLI](01-start): Use the OpenAI CLI to ask a simple question.
2. [Inspect OpenAI traffic with mitmproxy](02-proxy): See
underlying HTTP requests made by the OpenAI CLI.
3. [Trace OpenAI traffic with OpenTelemetry](03-opentelemetry): Transparently
export logs, metrics and traces from the OpenAI CLI.
4. [Write an OpenAI application](04-main): Write a Python script that asks the
same question we did with the OpenAI CLI.
5. [Integration test your application](05-test): Write a test that compensates
for variability or hallucinations in the LLM's response.
6. [Unit test your application with recorded HTTP responses](06-http-replay):
Record traffic from the OpenAI SDK, so that you can run unit tests offline.
7. [Unit test driven LLM evaluation](07-eval): Evaluate LLM responses in unit
tests, using the Phoenix Evals library and Pytest.
8. [Run LLM evaluations on application requests](08-eval-platform): Use Arize
Phoenix to evaluate LLM responses captured in OpenTelemetry traces.
9. [Attach user feedback to application requests](09-user-feedback): Use Arize
Phoenix to attach user feedback to OpenTelemetry traces.
10. [Elastic Stack and Arize Phoenix via EDOT Collector](10-elastic-phoenix):
Configure Elastic Distribution of OpenTelemetry (EDOT) Collector to send
observability data to both Elastic Stack and Arize Phoenix.
## Prerequisites
Docker and Python are required. You'll also need an OpenAI API compatible
inference platform and an OpenTelemetry collector.
First of all, you need to be in a directory that contains this repository. If
you haven't yet, you get one like this:
```bash
curl -L https://github.com/elastic/testing-genai-applications/archive/refs/heads/main.tar.gz | tar -xz
cd testing-genai-applications-main
```
### Podman
If you are using [Podman](https://podman.io/) to run docker containers, export
`HOST_IP`. If you don't you'll get this error running exercises:
> unable to upgrade to tcp, received 500
Here's how to export your `HOST_IP`:
* If macOS: `export HOST_IP=$(ipconfig getifaddr en0)`
* If Ubuntu: `export HOST_IP=$(hostname -I | awk '{print $1}')`
### Python
All examples use the same Python virtual environment. This reduces repetition,
even if some examples need more dependencies than others.
First, set up a Python virtual environment like this:
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install 'python-dotenv[cli]'
```
### OpenAI
Exercises use either the OpenAI CLI or the Python SDK it wraps. OpenAI SDKs
use the following main environment variables for configuration:
* `OPENAI_BASE_URL` - The OpenAI API base URL which defaults to https://api.openai.com/v1
* `OPENAI_API_KEY` - Your [Secret Key](https://platform.openai.com/account/api-keys).
If you are using an alternate inference platform, you will need to change the
`OPENAI_BASE_URL` and if it is unauthenticated, pass a fake value for
`OPENAI_API_KEY`.
You can choose one of the following options to set up `.env` with your
preferred inference platform:
*Note*: Do not share your `.env` file or API key publicly.
OpenAI Platform
[OpenAI Platform](https://platform.openai.com/) is a cloud-based service for
accessing OpenAI models. It requires an API key and may incur usage costs.
To use OpenAI, do the following:
1. Copy [.env.openai](.env.openai) to a file named `.env`.
- `cp .env.openai .env`
2. Set `OPENAI_API_KEY` in your `.env` file to your [Secret Key](https://platform.openai.com/account/api-keys).
Ollama
[Ollama](https://ollama.com/) is an open-source solution for running models
locally. It is free to use, but requires sufficient computational resources.
To start and use Ollama, do the following:
1. Ensure `ollama` is installed
- On macOS/Linux: `brew install ollama`
- For Windows or otherwise, see the [download page][ollama-dl].
2. Copy [.env.ollama](.env.ollama) to a file named `.env`.
- `cp .env.ollama .env`
3. In a separate terminal, run `OLLAMA_HOST=0.0.0.0 OLLAMA_CONTEXT_LENGTH=8192 ollama serve`
- This accepts OpenAI requests for any model on http://localhost:11434/v1
4. In this terminal, pull the chat and eval models
- `dotenv run -- sh -c 'ollama pull ${CHAT_MODEL}'`
- `dotenv run -- sh -c 'ollama pull ${EVAL_MODEL}'`
RamaLama
[RamaLama](https://ramalama.ai/) is an open-source solution for running models
locally. It is free to use, but requires sufficient computational resources.
1. Make sure `ramalama` is installed
- On macOS/Linux: `brew install ramalama`
- For Windows or otherwise, see the [installation guide][ramalama-dl].
2. Copy [.env.ramalama](.env.ramalama) to a file named `.env`.
- `cp .env.ramalama .env`
3. In a separate terminal, run `dotenv run -- sh -c 'ramalama serve ${CHAT_MODEL}'`
- This accepts OpenAI requests for ${CHAT_MODEL} on http://localhost:8080/v1
### OpenTelemetry
OpenTelemetry is a framework used to collect and export telemetry signals
(logs, metrics, and traces) from your applications.
For most examples in this workshop, you’ll need to configure OpenTelemetry to
export this data to either your console or a collector. Follow the steps below
to set up your preferred export method.
Console
If you want to view logs, metrics, and traces directly in your terminal, you
can configure OpenTelemetry to export telemetry data to the console.
To set this up, append [.env.otel.console](.env.otel.console) to your `.env`
file like this:
```bash
cat .env.otel.console >> .env
```
Elastic Stack
Elastic Stack is an open-source search platform. Elastic Distribution of
OpenTelemetry (EDOT) Collector receives logs, metrics and traces and Kibana
visualizes them.
To use a local Elastic Stack with EDOT Collector, append
[.env.otel.elastic](.env.otel.elastic) to your `.env` file like this:
```bash
cat .env.otel.elastic >> .env
```
#### Local Elastic Stack
Below starts Elasticsearch, Kibana, and Elastic Distribution of OpenTelemetry
(EDOT) Collector and only requires Docker installed. Before you begin, ensure
you have free CPU and memory on your Docker host (laptop). Assume 4 cpus and
4GB memory for the containers in the Elastic Stack.
First, get a copy of docker-compose-elastic.yml
```bash
wget https://raw.githubusercontent.com/elastic/elasticsearch-labs/refs/heads/main/docker/docker-compose-elastic.yml
```
Next, start this Elastic Stack in the background:
```bash
docker compose -f docker-compose-elastic.yml up --force-recreate --wait -d
```
If you start your Elastic stack this way, you can access Kibana like this,
authenticating with the username "elastic" and password "elastic":
http://localhost:5601/app/apm/traces?rangeFrom=now-15m&rangeTo=now
Clean up when finished, like this:
```bash
docker compose -f docker-compose-elastic.yml down
```
otel-tui
[otel-tui][otel-tui] is an easy to navigate, single binary OpenTelemetry system
that runs in your terminal.
Choose one of the following ways to run `otel-tui` in a separate terminal.
To run in docker:
```bash
docker run --rm -it --name otel-tui -p 4318:4318 ymtdzzz/otel-tui:latest
```
Or, to run on your host:
```bash
brew install ymtdzzz/tap/otel-tui
# or go install github.com/ymtdzzz/otel-tui@latest
otel-tui
```
Arize Phoenix
[Arize Phoenix][phoenix] is an OpenTelemetry compatible AI Observability and
Evaluation tool.
Choose one of the following ways to run `phoenix` in the background, with
authentication disabled.
To run in docker:
```bash
docker run --rm -d --name phoenix -p 6006:6006 -e PHOENIX_ENABLE_AUTH=false arizephoenix/phoenix:latest
```
Or, to run on your host:
```bash
brew install uv
PHOENIX_ENABLE_AUTH=false uvx arize-phoenix serve
```
Finally, append [.env.otel.phoenix](.env.otel.phoenix) to your `.env` file like
this:
```bash
cat .env.otel.phoenix >> .env
```
---
[ollama-dl]: https://ollama.com/download
[ramalama-dl]: https://github.com/containers/ramalama?tab=readme-ov-file#install
[otel-tui]: https://github.com/ymtdzzz/otel-tui
[phoenix]: https://arize.com/docs/phoenix