# testing-genai-applications **Repository Path**: mirrors_elastic/testing-genai-applications ## Basic Information - **Project Name**: testing-genai-applications - **Description**: Testing GenAI Applications Workshop - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-18 - **Last Updated**: 2026-04-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Testing GenAI Applications Workshop This workshop walks you through testing Generative AI (GenAI) applications using the OpenAI CLI, SDK and a compatible LLM provider. Each exercise builds on the last, so do them in order! We'll start with the OpenAI CLI, and how to inspect traffic to the OpenAI API using a proxy and how to enable observability with OpenTelemetry. We'll then get to coding in Python, and write our first test in consideration of LLM quirks like variability and hallucinations. We'll get progressively more sophisticated with offline testing before we complete with evals using the LLM-as-a-judge pattern. Let's begin! ## Example Application Flow Regardless of whether we use the OpenAI CLI or its Python SDK, our example remains consistent. The user asks a simple question: > Answer in up to 3 words: Which ocean contains Bouvet Island? We configure `temperature=0` to ensure a consistent LLM response, ideally "Atlantic Ocean" or a close variant. However, the LLM might offer a debatable answer like "Southern Ocean" or even hallucinate, which adds an engaging twist to the learning experience! We’ll explore this example first with the OpenAI CLI and then with the Python SDK. Since both tools interact with the OpenAI API in the same way, the flow remains identical: ```mermaid sequenceDiagram participant User participant Application participant API as OpenAI API User ->> Application: "Answer in up to 3 words: Which ocean contains Bouvet Island?" Note over Application: OpenAI CLI or Python SDK activate Application Application ->> API: POST /chat/completions with {model, messages, temperature=0} activate API Note over API: Generates text with the model API ->> Application: JSON response with completion text deactivate API Application ->> User: "Atlantic Ocean" deactivate Application ``` ## Exercises This workshop is designed to run in order. Make sure you perform all prerequisites before starting the exercises. 1. [Use the OpenAI CLI](01-start): Use the OpenAI CLI to ask a simple question. 2. [Inspect OpenAI traffic with mitmproxy](02-proxy): See underlying HTTP requests made by the OpenAI CLI. 3. [Trace OpenAI traffic with OpenTelemetry](03-opentelemetry): Transparently export logs, metrics and traces from the OpenAI CLI. 4. [Write an OpenAI application](04-main): Write a Python script that asks the same question we did with the OpenAI CLI. 5. [Integration test your application](05-test): Write a test that compensates for variability or hallucinations in the LLM's response. 6. [Unit test your application with recorded HTTP responses](06-http-replay): Record traffic from the OpenAI SDK, so that you can run unit tests offline. 7. [Unit test driven LLM evaluation](07-eval): Evaluate LLM responses in unit tests, using the Phoenix Evals library and Pytest. 8. [Run LLM evaluations on application requests](08-eval-platform): Use Arize Phoenix to evaluate LLM responses captured in OpenTelemetry traces. 9. [Attach user feedback to application requests](09-user-feedback): Use Arize Phoenix to attach user feedback to OpenTelemetry traces. 10. [Elastic Stack and Arize Phoenix via EDOT Collector](10-elastic-phoenix): Configure Elastic Distribution of OpenTelemetry (EDOT) Collector to send observability data to both Elastic Stack and Arize Phoenix. ## Prerequisites Docker and Python are required. You'll also need an OpenAI API compatible inference platform and an OpenTelemetry collector. First of all, you need to be in a directory that contains this repository. If you haven't yet, you get one like this: ```bash curl -L https://github.com/elastic/testing-genai-applications/archive/refs/heads/main.tar.gz | tar -xz cd testing-genai-applications-main ``` ### Podman If you are using [Podman](https://podman.io/) to run docker containers, export `HOST_IP`. If you don't you'll get this error running exercises: > unable to upgrade to tcp, received 500 Here's how to export your `HOST_IP`: * If macOS: `export HOST_IP=$(ipconfig getifaddr en0)` * If Ubuntu: `export HOST_IP=$(hostname -I | awk '{print $1}')` ### Python All examples use the same Python virtual environment. This reduces repetition, even if some examples need more dependencies than others. First, set up a Python virtual environment like this: ```bash python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install 'python-dotenv[cli]' ``` ### OpenAI Exercises use either the OpenAI CLI or the Python SDK it wraps. OpenAI SDKs use the following main environment variables for configuration: * `OPENAI_BASE_URL` - The OpenAI API base URL which defaults to https://api.openai.com/v1 * `OPENAI_API_KEY` - Your [Secret Key](https://platform.openai.com/account/api-keys). If you are using an alternate inference platform, you will need to change the `OPENAI_BASE_URL` and if it is unauthenticated, pass a fake value for `OPENAI_API_KEY`. You can choose one of the following options to set up `.env` with your preferred inference platform: *Note*: Do not share your `.env` file or API key publicly.
OpenAI Platform [OpenAI Platform](https://platform.openai.com/) is a cloud-based service for accessing OpenAI models. It requires an API key and may incur usage costs. To use OpenAI, do the following: 1. Copy [.env.openai](.env.openai) to a file named `.env`. - `cp .env.openai .env` 2. Set `OPENAI_API_KEY` in your `.env` file to your [Secret Key](https://platform.openai.com/account/api-keys).
Ollama [Ollama](https://ollama.com/) is an open-source solution for running models locally. It is free to use, but requires sufficient computational resources. To start and use Ollama, do the following: 1. Ensure `ollama` is installed - On macOS/Linux: `brew install ollama` - For Windows or otherwise, see the [download page][ollama-dl]. 2. Copy [.env.ollama](.env.ollama) to a file named `.env`. - `cp .env.ollama .env` 3. In a separate terminal, run `OLLAMA_HOST=0.0.0.0 OLLAMA_CONTEXT_LENGTH=8192 ollama serve` - This accepts OpenAI requests for any model on http://localhost:11434/v1 4. In this terminal, pull the chat and eval models - `dotenv run -- sh -c 'ollama pull ${CHAT_MODEL}'` - `dotenv run -- sh -c 'ollama pull ${EVAL_MODEL}'`
RamaLama [RamaLama](https://ramalama.ai/) is an open-source solution for running models locally. It is free to use, but requires sufficient computational resources. 1. Make sure `ramalama` is installed - On macOS/Linux: `brew install ramalama` - For Windows or otherwise, see the [installation guide][ramalama-dl]. 2. Copy [.env.ramalama](.env.ramalama) to a file named `.env`. - `cp .env.ramalama .env` 3. In a separate terminal, run `dotenv run -- sh -c 'ramalama serve ${CHAT_MODEL}'` - This accepts OpenAI requests for ${CHAT_MODEL} on http://localhost:8080/v1
### OpenTelemetry OpenTelemetry is a framework used to collect and export telemetry signals (logs, metrics, and traces) from your applications. For most examples in this workshop, you’ll need to configure OpenTelemetry to export this data to either your console or a collector. Follow the steps below to set up your preferred export method.
Console If you want to view logs, metrics, and traces directly in your terminal, you can configure OpenTelemetry to export telemetry data to the console. To set this up, append [.env.otel.console](.env.otel.console) to your `.env` file like this: ```bash cat .env.otel.console >> .env ```
Elastic Stack Elastic Stack is an open-source search platform. Elastic Distribution of OpenTelemetry (EDOT) Collector receives logs, metrics and traces and Kibana visualizes them. To use a local Elastic Stack with EDOT Collector, append [.env.otel.elastic](.env.otel.elastic) to your `.env` file like this: ```bash cat .env.otel.elastic >> .env ``` #### Local Elastic Stack Below starts Elasticsearch, Kibana, and Elastic Distribution of OpenTelemetry (EDOT) Collector and only requires Docker installed. Before you begin, ensure you have free CPU and memory on your Docker host (laptop). Assume 4 cpus and 4GB memory for the containers in the Elastic Stack. First, get a copy of docker-compose-elastic.yml ```bash wget https://raw.githubusercontent.com/elastic/elasticsearch-labs/refs/heads/main/docker/docker-compose-elastic.yml ``` Next, start this Elastic Stack in the background: ```bash docker compose -f docker-compose-elastic.yml up --force-recreate --wait -d ``` If you start your Elastic stack this way, you can access Kibana like this, authenticating with the username "elastic" and password "elastic": http://localhost:5601/app/apm/traces?rangeFrom=now-15m&rangeTo=now Clean up when finished, like this: ```bash docker compose -f docker-compose-elastic.yml down ```
otel-tui [otel-tui][otel-tui] is an easy to navigate, single binary OpenTelemetry system that runs in your terminal. Choose one of the following ways to run `otel-tui` in a separate terminal. To run in docker: ```bash docker run --rm -it --name otel-tui -p 4318:4318 ymtdzzz/otel-tui:latest ``` Or, to run on your host: ```bash brew install ymtdzzz/tap/otel-tui # or go install github.com/ymtdzzz/otel-tui@latest otel-tui ```
Arize Phoenix [Arize Phoenix][phoenix] is an OpenTelemetry compatible AI Observability and Evaluation tool. Choose one of the following ways to run `phoenix` in the background, with authentication disabled. To run in docker: ```bash docker run --rm -d --name phoenix -p 6006:6006 -e PHOENIX_ENABLE_AUTH=false arizephoenix/phoenix:latest ``` Or, to run on your host: ```bash brew install uv PHOENIX_ENABLE_AUTH=false uvx arize-phoenix serve ``` Finally, append [.env.otel.phoenix](.env.otel.phoenix) to your `.env` file like this: ```bash cat .env.otel.phoenix >> .env ```
--- [ollama-dl]: https://ollama.com/download [ramalama-dl]: https://github.com/containers/ramalama?tab=readme-ov-file#install [otel-tui]: https://github.com/ymtdzzz/otel-tui [phoenix]: https://arize.com/docs/phoenix