# osd-alert-analysis **Repository Path**: mirrors_openshift/osd-alert-analysis ## Basic Information - **Project Name**: osd-alert-analysis - **Description**: Web application and supporting CLI tools for analyzing OpenShift Dedicated cluster alert data. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-04-13 - **Last Updated**: 2026-02-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # WARNING: This project has been archived and is no longer actively maintained. _If you wish to continue to use or develop this code yourself, we recommend you fork this repository._ ![status: inactive](https://img.shields.io/badge/status-inactive-red.svg) # OSD Alert Analysis Web application and supporting CLI tools for analyzing OpenShift Dedicated PagerDuty alert data. ## Requirements * Python 3.9+ * Don't forget to run `pip3 install -r requirements.txt`* * a MariaDB/MySQL server* * a PagerDuty API token * Docker/Podman/K8s/OpenShift (optional) *If pip complains about being unable to find a specific version of a dependency module, it's probably because you're using a version of Python older than 3.9. If pip instead complains about failing to build the MariaDB module, make sure you have the MariaDB C Connector, GCC, and the Python headers installed (e.g., on Fedora/RHEL: `dnf install mariadb-connector-c mariadb-connector-c-devel gcc python39-devel`). ## Quick Start *This section will help you set up a basic testing/development environment. These settings should *not* be used for production installations.* Clone this repo and `cd` into it. Assuming you're using RHEL or Fedora, run the following commands. Users of other Linux distributions should translate commands accordingly (e.g., Debian/Ubuntu users can use `apt` instead of `dnf`, non-podman users can replace `podman` with `docker`, etc.). ```bash # Install dependencies sudo dnf install python39 python39-devel mariadb-connector-c mariadb-connector-c-devel gcc podman pip3.9 install -r requirements.txt # We'll use a MariaDB container as our local database podman pull mariadb # Fill in the values before running the command below, which creates a root user, empty database, and database-owning user # You may use any arbitrary alphanumeric string for each of these values podman run --detach --env MARIADB_DATABASE= --env MARIADB_USER= --env MARIADB_PASSWORD= --env MARIADB_ROOT_PASSWORD= --name oaa-mariadb -p 3306:3306 mariadb:latest ``` After creating the empty database as shown in the above step, create a basic `.env` file at the root of the repo using the template below (filling in the values with the same values you used above when creating your database, where applicable). ```bash AA_PD_API_TOKEN= AA_PD_TEAMS= AA_RO_DB_STRING=mariadb+mariadbconnector://:@127.0.0.1:3306/ AA_RW_DB_STRING=mariadb+mariadbconnector://:@127.0.0.1:3306/ AA_QUESTION_CLASSES=QMostFrequent:QNeverAcknowledged:QNeverAcknowledgedSelfResolved:QAcknowledgedUnresolved:QSelfResolvedImmediately:QSREResolvedImmediately:QFlappingShift ``` You may now proceed with populating your incident cache database using `updater.py`. As explained below, fully-populating your database may take hours. You can instead only download the last 100 incidents into your database using the following command (which should only take 5-10 minutes). ```bash python3 updater.py --limit 100 --verbose ``` Once your database is populated with at least 10 incidents/alerts, you can start a development web server by running the `wsgi.py` file, as shown below. Note that you may have to replace `python3` with `python39`, `python3.9`, or similar depending on your distribution. ```bash python3 wsgi.py ``` This command should produce a URL that you can now open in your browser to view the webUI. From this point on, you may now make changes to the codebase and then kill and relaunch the `wsgi.py` process to see your changes take effect. If you reboot your computer or otherwise stop the Docker/Podman container hosting your database, you can relaunch it by running `podman start oaa-mariadb`. You have now completed *Quick Start*. The following sections contain more detailed information about how to configure and use OAA. ## Initial Caching Database Setup The PagerDuty API is too slow/rate-limited to be used directly by the web application. Instead, you'll need to set up, populate, and regularly refresh a caching database. Create an empty database in your SQL server and two service accounts: one with full (admin) privileges over the database, and another only with read privileges. Create a file named `.env` at the root of this repo and fill it in like so: ```bash AA_PD_API_TOKEN= AA_PD_TEAMS=A_list_of:colon-separated_PagerDuty_team_IDs:that_look_like_this:XY1234Z # DB_STRINGs should be SQLAlchemy database engine URLs. # See docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine AA_RO_DB_STRING=sqlite+pysqlite:///:memory: # Access a read-only account AA_RW_DB_STRING=sqlite+pysqlite:///:memory: # Access a read-write account # QUESTION_CLASSES is a list of the questions you'd like to display on the web UI. # These should be class names from questions.py AA_QUESTION_CLASSES=QMostFrequent:QNeverAcknowledgedSelfResolved:QFlappingShift ``` *Note*: if you're just experimenting/testing, feel free the leave the SQLite database string shown above as is. Just know that this will create the database in-memory and will be dropped as soon as the updater script/web application exists. Then populate the database using `updater.py`, as specified in the help message shown below. ``` $ ./updater.py --help usage: updater.py [-h] [-s SINCE] [-u UNTIL] [-l LIMIT] [-v] Updates the OSD alert-analysis tool's cache of PagerDuty incidents and alerts options: -h, --help show this help message and exit -s SINCE, --since SINCE start of the caching time window in ISO-8601 format (default: 30d ago) -u UNTIL, --until UNTIL end of the caching time window in ISO-8601 format (default: now) -l LIMIT, --limit LIMIT maximum number of incidents to cache (default: 10000) -b DAYS, --backfill DAYS do a normal run, then check to see if the oldest record in the cache is at least DAYS days old. if it's not, update cache until it is, batching in sizes of LIMIT if necessary. --since and --until have no effect after the initial run. -v, --verbose ``` Note that this process can take several hours. If successful, output will look like: ``` $ ./updater.py --since="2022-01-01" --until="2022-02-01" Updating incident cache...done. Cached 9786 incidents. Updating alert cache...done. Cached 10012 alerts. ``` We recommend setting up a cronjob that runs `./updater.py --backfill 90` once or twice per day in order to cache at least 90 days of history and refresh the 10,000 most-recently-created incidents. Incidents older than 90 days will not be deleted from the cache, they're simply not updated. ## Web Application Setup We recommend running the web application as a container. This project conforms to the [Source-To-Image](https://github.com/openshift/source-to-image) standard, so you have several build options. *Note*: the following instructions assume you've got your database running, populated, and network-accessible to the execution enviroment (e.g., container) you're about to create. ### Building and running a Docker image locally Clone this repo, `cd` into it, and fill out your `.env` file (see above). Then run: ``` docker build . --tag "alert-analysis:latest" docker run -d -p 8080:8080 --name my_aa --env-file .env alert-analysis:latest ``` Then navigate to http://localhost:8080 in your browser to see the UI. If you get a blank screen, the application is probably having trouble connecting to your database. Run `docker logs my_aa` and see the *note* above for more details. ### Deploying on OpenShift Create and fill out your `.env` file (see above). We'll use `oc new-app` to build our image. If your OpenShift cluster can access this repository (e.g., because this is a public repo and your cluster has internet access), run the following to create an app that OpenShift can automatically rebuild whenever an update is pushed to `main`*: ```bash export REPO_URL="" oc new-app $REPO_URL --strategy=source --env-file=.env --name=my-aa oc patch svc/my-aa --type=json -p '[{"op": "replace", "path": "/spec/ports/0/port", "value":80}]' ``` If your OpenShift cluster does not have access to this repo (e.g., because this is private or VPN-restricted repo), clone the repo onto your local machine and run the following to generate a one-time binary build: ```bash export SRC_PATH="" oc new-app --binary --strategy=docker --env-file=.env --name=my-aa oc start-build my-aa --from-dir=$SRC_PATH oc expose deployment my-aa --port 80 --target-port 8080 ``` *Note*: the use of the `--env-file` flag bakes your `.env` directly into the generated `Deployment` in plaintext. A much more secure deployment would instead use `Secret`s and `ConfigMap`s to provide the necessary config values to the running pod. Finally, expose the app to the world outside the cluster using: ```bash oc expose svc/my-aa echo "Now head to http://$(oc get route/my-aa -o jsonpath={.spec.host})" ``` *To find the webhook URL needed for Git-triggered builds, open the OpenShift web console, find the `BuildConfiguration` generated by `oc new-app`, scroll to the bottom of the page, and copy the webhook URL shown. ## Web Application Usage The web application is currently a single-page collection of tables answering the questions specified by the `AA_QUESTION_CLASSES` config value. The columns of each table support sorting (click the little arrows next to column name) and filtering (enter your search term or [filter](https://dash.plotly.com/datatable/filtering#filtering-operators) into the cell below the column name) ## Errata * Some alert names will be abbreviated when loaded into the cache. See the `Alert.standardize_name()` function in `models.py` to see how this works.