# EvalAgent **Repository Path**: region-ai/EvalAgent ## Basic Information - **Project Name**: EvalAgent - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-13 - **Last Updated**: 2026-02-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Eval Agent Eval Agent is an end-to-end, agentic application evaluation system that plans, executes, observes, and evaluates real applications running in real environments. It is built around a strict control plane / execution plane split: the backend owns all intelligence, and the desktop runner owns all physical interaction. This repository was formed by merging two separate projects: * `app_eval_desktop` -> `desktop` * `app_evaluation_agent` -> `backend` The unified project name is **Eval_Agent**. ## Table of Contents - [Overview](#overview) - [Project Lineage](#project-lineage) - [Architecture At A Glance](#architecture-at-a-glance) - [System Architecture Diagram](#system-architecture-diagram) - [Execution Model](#execution-model) - [Desktop Runner (Execution Plane)](#desktop-runner-execution-plane) - [Core Responsibilities](#core-responsibilities) - [Desktop System Architecture](#desktop-system-architecture) - [Data and Control Flow](#data-and-control-flow) - [User Interface](#user-interface) - [IPC Contracts](#ipc-contracts) - [Backend API Contract (Desktop)](#backend-api-contract-desktop) - [Desktop Configuration](#desktop-configuration) - [Desktop Build and Run](#desktop-build-and-run) - [Desktop Development Notes](#desktop-development-notes) - [Desktop Troubleshooting](#desktop-troubleshooting) - [Backend (Control Plane)](#backend-control-plane) - [Product Overview](#product-overview) - [Core Functional Requirements](#core-functional-requirements) - [Evaluation System Metrics](#evaluation-system-metrics) - [Core Agent Design](#core-agent-design) - [Bug Management Specification](#bug-management-specification) - [Test Case Management](#test-case-management) - [Product Deliverables](#product-deliverables) - [Execution Modes](#execution-modes) - [High-Level Workflow](#high-level-workflow) - [Component Breakdown](#component-breakdown) - [Bug Tracking and Triage](#bug-tracking-and-triage) - [Vision and Coordinate Mapping](#vision-and-coordinate-mapping) - [Persistence Layer](#persistence-layer) - [Background Tasks](#background-tasks) - [Integrations and Utilities](#integrations-and-utilities) - [Backend File Structure](#backend-file-structure) - [Vision Pipeline Notes](#vision-pipeline-notes) - [Database Schema](#database-schema) - [API Reference](#api-reference) - [Backend Setup and Installation](#backend-setup-and-installation) - [Testing](#testing) - [Docs Map](#docs-map) ## Overview Eval Agent evaluates applications by executing real steps on real targets while a backend LLM-driven control plane decides the next action. The system is intentionally split: * **Backend (Control Plane):** planning, vision analysis, action selection, bug reasoning, metrics, storage. * **Desktop Runner (Execution Plane):** capture, deterministic action execution, visualization. ## Project Lineage Eval Agent is the merged successor of two projects: * `app_eval_desktop` -> **desktop** (Electron + TypeScript executor) * `app_evaluation_agent` -> **backend** (FastAPI control plane) All prior architecture, endpoints, and README material has been consolidated here for a single, canonical entrypoint. ## Architecture At A Glance * Control plane orchestrates evaluation lifecycles, test plans, test cases, LLM reasoning, bug management, and summaries. * Execution plane performs capture and input on real machines, using deterministic action execution. * Two execution modes are supported: secure cloud execution (headless VM) and interactive local execution (desktop runner). ## System Architecture Diagram ```mermaid graph TD subgraph "Client / CI/CD" U1[Desktop Runner] CI[CI/CD Pipeline] end subgraph "Control Plane (FastAPI Backend)" API[FastAPI Server] DB[(PostgreSQL)] REDIS[(Redis Queue)] S3[S3 Artifact Storage] SCAN[Virus Scanner] end subgraph "Execution - Cloud" EXE[Executor Host] VM[Ephemeral Test VM] end subgraph "Execution - Local" DR[Desktop Runner Client] AUT[App Under Test] end %% Cloud Path CI -->|POST /evaluations cloud| API API --> SCAN SCAN -->|Clean| S3 API -->|enqueue| REDIS EXE -->|poll queue| REDIS EXE --> VM VM -->|test + send results| API API --> DB %% Local Path U1 -->|POST /evaluations/upload local| API API --> SCAN SCAN -->|Clean| S3 U1 --> DR DR -->|GET /jobs/next| API API --> DR DR --> AUT DR -->|Screenshots + Context| API API -->|LLM reasoning + coord map| DR DR -->|Exec actions| AUT DR -->|Results| API API --> DB ``` ## Execution Model Eval Agent uses a TestCase runner model: 1. Desktop runner polls for the next assigned TestCase. 2. If assigned, the runner launches the target application (desktop or web). 3. The runner enters a deterministic step loop, sending a screenshot + context and receiving exactly one action per step. 4. The backend terminates the TestCase with a `finish_task` action. ```mermaid sequenceDiagram participant DR as Desktop Runner participant API as Backend participant AUT as App Under Test DR->>API: GET /api/v1/testcases/next?executor_id=... alt TestCase assigned DR->>AUT: Launch app loop Step loop (max 40) DR->>API: POST /api/v1/vision/analyze (screenshot + context) API-->>DR: {thought, action, description} DR->>AUT: Execute action end DR->>API: PATCH /api/v1/testcases/{id} (result) DR->>AUT: Tear down app else None assigned DR->>DR: Idle or stop end ``` ## Desktop Runner (Execution Plane) App Eval Desktop is the Electron + TypeScript executor. It is intentionally thin: no local perception, UI parsing, or model inference. All reasoning and decision-making lives in the backend. ### Core Responsibilities **Capture** * High-performance BGRA capture via Windows Desktop Duplication API * Multi-monitor aware * Optional exclusion from capture (`WDA_EXCLUDEFROMCAPTURE`) * Conversion to PNG before upload **Execution** * Mouse actions: click, double-click, right-click, hover, drag, scroll * Keyboard actions: shortcuts, simulated typing * Clipboard-based direct text entry (paste) * Deterministic waits and task completion **Orchestration** * Polls backend for next assigned TestCase execution * Runs TestCases sequentially * Handles pause, resume, stop * Manages app lifecycle (launch + teardown) **Visualization** * Live screenshot preview * Structured logs * Step-by-step run timeline * Agent context inspection * Evaluation and TestCase history ### Desktop System Architecture ``` desktop/ ├── scripts/ │ ├── utils/copy-recursive.js │ ├── copy-renderer.js │ └── copy-native.js ├── src/ │ ├── main.ts # Electron entry, windows, IPC │ ├── preload.ts # Secure IPC bridge │ ├── config.ts # Runtime configuration │ │ │ ├── core/ │ │ ├── orchestrator.ts # TestCase runner loop │ │ ├── context.ts # AgentExecutionContext │ │ └── logger.ts # Structured logging │ │ │ ├── agent/ │ │ ├── executor.ts # nut-js action executor │ │ ├── coord-mapper.ts # analysis/capture -> screen mapping │ │ └── capture/native/ # C++ Desktop Duplication addon │ │ │ ├── api/ │ │ └── client.ts # REST + vision calls │ │ │ ├── renderer/ │ │ ├── locales/ │ │ ├── pages/ │ │ ├── shared/ │ │ └── styles/ │ │ │ └── types/ │ └── evaluations.d.ts ├── test/ │ └── test-window-capture.ts ``` ### Data and Control Flow **Per-step loop** 1. Capture native screenshot -> PNG (with brightness sanity checks). 2. Assemble `AgentExecutionContext` and last focus coordinates. 3. POST screenshot + context to `/api/v1/vision/analyze`. 4. Receive `{ thought, action, description }`. 5. Map coordinates via `coord-mapper.ts` and execute. 6. Update scratchpad, action history, and UI timeline. ### User Interface **Agent View (Run)** * Live screenshot preview * Step timeline (thought + action + screenshot) * Structured logs (SYSTEM / JOB / AGENT / TOOL / CAPTURE / WARN / ERROR) * Pause / resume * Compact mode toggle **Apps** * Browse apps, versions, and evaluations * Focus a lineage branch and reset focus * Create apps + versions (upload or URL) * Delete apps or versions * Jump to evaluation history **Evaluations** * Assigned evaluations list * Metadata: goal, app type, timestamps * Link to history * Delete evaluation * Regenerate or edit summary (for completed evaluations) **History** * Infinite scroll TestCase history * Markdown rendering of results * Copy / download summary * Re-run TestCase **Bugs** * Bug list per app with filtering and search * Create, edit, delete bugs (status, severity, priority, fingerprint) * Track occurrences tied to evaluations/TestCases * Record branch-scoped fixes and verification notes **Compact Mode** * Always-on-top minimal window * Logs + status * Execution controls ### IPC Contracts Renderer communicates only via preload IPC. Key channels include: * `getAssignedEvaluations` * `fetchEvaluation` * `deleteEvaluation` * `run:start / pause / resume / stop` * `injectHumanPrompt` * `agent-context-updated` * `evaluation-attached` * `run-timeline-entry` * `history:refresh` * `getLogBuffer / onLogUpdate` * `listBugs / getBug / createBug / updateBug / deleteBug` * `listBugOccurrences / createBugOccurrence` * `listBugFixes / createBugFix / deleteBugFix` ### Backend API Contract (Desktop) Key endpoints used by the desktop runner: * `GET /api/v1/apps` * `POST /api/v1/apps` * `DELETE /api/v1/apps/{app_id}` * `GET /api/v1/apps/{app_id}/versions` * `POST /api/v1/apps/{app_id}/versions` * `DELETE /api/v1/apps/{app_id}/versions/{version_id}` * `GET /api/v1/apps/{app_id}/versions/{version_id}/evaluations` * `GET /api/v1/apps/{app_id}/bugs` * `POST /api/v1/bugs` * `GET /api/v1/bugs/{bug_id}` * `PATCH /api/v1/bugs/{bug_id}` * `DELETE /api/v1/bugs/{bug_id}` * `GET /api/v1/bugs/{bug_id}/occurrences` * `POST /api/v1/bugs/{bug_id}/occurrences` * `GET /api/v1/bugs/{bug_id}/fixes` * `POST /api/v1/bugs/{bug_id}/fixes` * `DELETE /api/v1/bugs/{bug_id}/fixes/{fix_id}` * `GET /api/v1/testcases/next` * `PATCH /api/v1/testcases/{id}` * `POST /api/v1/vision/analyze` * `GET /api/v1/evaluations/{id}` * `PATCH /api/v1/evaluations/{id}/summary` * `POST /api/v1/evaluations/{id}/regenerate-summary` See `docs/endpoints.md` for the authoritative schema. ### Desktop Configuration `.env`: ```env API_BASE_URL=http://127.0.0.1:8000 EXECUTOR_ID= ``` Additional behavior: * Capture defaults controlled in `desktop/src/config.ts` * Theme, language, executor ID configurable via Settings UI * Executor ID persists across restarts ### Desktop Build and Run Install dependencies: ```bash npm install ``` Build: ```bash npm run build ``` Dev (renderer + Electron with Vite HMR): ```bash npm run dev ``` Dev (separate terminals): ```bash npm run dev:renderer npm run dev:electron ``` Run: ```bash npm start ``` Package: ```bash npm run make ``` Test native capture: ```bash npx ts-node test/test-window-capture.ts ``` ### Desktop Development Notes * Renderer is fully sandboxed (no Node access) * All side effects happen in `main` / `orchestrator` * Vision is non-streaming * Clipboard is restored after direct text entry * Click-through is reference-counted to avoid stuck windows * Max 40 steps per TestCase ### Desktop Troubleshooting **No screenshots** * Rebuild native addon * Update GPU drivers * Run capture test script **Actions misaligned** * Check `space` + `normalized` flags * Verify capture resolution vs model space * Inspect `desktop/src/agent/coord-mapper.ts` **Agent stuck** * Confirm backend returns `finish_task` * Check TestCase status transitions * Inspect vision analyze logs ## Backend (Control Plane) ### Product Overview The backend is an AI-based automated application evaluation system. It uses a visual multimodal model and multi-agent collaboration to achieve end-to-end app exploration, evaluation metric calculation, full bug lifecycle management, and test case generation. ### Core Functional Requirements **App Information Parsing** * Textual material grading * Level 1: functional brief of <=200 words * Level 2: introductory guide covering basic operations * Level 3: full official documentation/manual * Interface understanding * Automatically detects and classifies UI elements (buttons, inputs, icons) * Supports structural layout parsing **Automated Evaluation Process** 1. Test case generation 2. Feature exploration execution 3. Bug detection and management 4. Version difference analysis 5. Evaluation metric calculation 6. Evaluation report generation ### Evaluation System Metrics | Metric | Calculation Method | Core Parameters | | ---------------- | -------------------------------------------------------------------------------- | ------------------------- | | Stability | `1 - (Crash Rate * 0.7 + Functional Abnormality Rate * 0.3)` | Crash Count / Total Tasks | | Usability | `1 - (Step Efficiency * 0.5 + Time Efficiency * 0.5)` | Steps / Avg Steps | | Learnability | `(1 - Basic Exploration Efficiency) * Text Level Coeff + Feature Coverage * 0.2` | Exploration time / Avg | | Completeness | `Feature Coverage * 0.4 + Integrity * 0.6` | Implemented Features | ### Core Agent Design * `app_evaluation_agent/services/agents/coordinator.py`: **CoordinatorAgent** (bootstraps plans and test cases) * `app_evaluation_agent/services/agents/planner.py`: **PlannerAgent** (LLM plan + test case generation) * `app_evaluation_agent/services/agents/analyzer.py`: **AnalyzerAgent** (vision analysis + coordinate mapping) * `app_evaluation_agent/services/agents/summarizer.py`: **SummarizerAgent** (final evaluation summary) * `app_evaluation_agent/services/agents/bug_triage.py`: **BugTriageAgent** (extracts bugs from results) ### Bug Management Specification **Severity level definitions** | Level | Definition | Response Time | Example | | ----- | ------------------- | ------------------ | --------------------- | | P0 | Critical blocker | 24 hours | Crash on launch | | P1 | Severe abnormality | 3 days | Payment cannot submit | | P2 | General abnormality | One iteration | Button unresponsive | | P3 | Minor issue | Next major release | UI contrast issue | **Status transition rules** `New -> In Progress -> Pending Verification -> Closed -> (optional Reopen)` ### Test Case Management **General Task Description** * Task ID * Description * Expected Result * Priority **Version-Specific Steps** * Numbered operational steps for each version ### Product Deliverables * Functional specification * Evaluation report * Bug list * Bug tracking sheet * Test case set * Operation process dataset ### Execution Modes **Secure Cloud Execution (Headless VM Testing)** * Intended for CI/CD, regression testing, and scalable automation * Backend enqueues background tasks via Redis ARQ * Cloud executor (outside this repo) consumes jobs and launches ephemeral VMs **Interactive Local Execution (Desktop Runner)** * Used for developer debugging and exploratory testing * Desktop runner polls for test cases, captures screenshots, and executes actions locally * Coordinate correction is handled server-side ### High-Level Workflow **Cloud path** 1. Client or CI submits an evaluation 2. Backend scans and stores artifacts 3. Evaluation is enqueued via Redis 4. Cloud executor pulls the job 5. VM runs the app headlessly 6. Results are sent back and persisted **Local path** 1. Desktop runner uploads or selects an evaluation 2. Backend scans and stores artifacts 3. Runner polls `/testcases/next` 4. Runner captures screenshots and sends context 5. Backend vision agent returns actions 6. Runner executes actions locally 7. Results are stored and summarized ### Component Breakdown **Entry and API Layer** * FastAPI app: `backend/app_evaluation_agent/main.py` * Routes under `api/v1/`: * `apps.py` - app + version management * `evaluations.py` - evaluation CRUD and lifecycle * `testplans.py` - test plan access * `testcases.py` - test case assignment and updates * `vision.py` - screenshot + LLM vision reasoning * `logs.py` - log streaming/export **Agent Layer (`services/agents/`)** * PlannerAgent: generates high-level plans and test cases * CoordinatorAgent: bootstraps evaluations and assigns test cases to executors * AnalyzerAgent: handles `/vision/analyze`, builds prompts, calls vLLM, remaps coords * SummarizerAgent: produces final evaluation reports after test case completion * BugTriageAgent: extracts and dedupes bugs by fingerprint * LLM client: `llm_client.py` **Business Services** * `services/apps.py` - app + version management, evaluation creation * `services/evaluations.py` - evaluation lifecycle and planner bootstrap * `services/testcases.py` - test case assignment, completion, bug triage ### Bug Tracking and Triage * Bug extraction happens when a runner patches a test case with results via `PATCH /api/v1/testcases/{testcase_id}`. * `BugTriageAgent` parses result payloads and emits 0..N bug drafts. * Bugs are deduped per app by `fingerprint`, with `last_seen_at` updated on repeats. * Each observation is stored as a `BUG_OCCURRENCE` linked to evaluation, test case, app version, step index, action/expected/actual, plus optional artifact URIs. * Fixes are recorded in `BUG_FIX` with `fixed_in_version_id` and optional `verified_by_evaluation_id`. * Severity/status enums are validated; state transitions are not enforced by the backend. ### Vision and Coordinate Mapping * AnalyzerAgent consumes screenshot + AgentContext and calls the vision LLM * Coordinates are remapped using `services/vllm_coordinate_mapper.py` * Supports letterboxing, normalization, and capture origin offsets ### Persistence Layer * SQLAlchemy models: `backend/app_evaluation_agent/storage/models.py` * Async engine/session: `backend/app_evaluation_agent/storage/database.py` * Schemas: `backend/app_evaluation_agent/schemas/` * Migrations: Alembic (`backend/alembic/`) ### Background Tasks * Redis ARQ worker: `backend/app_evaluation_agent/worker.py` * Used for summarization and cloud job enqueueing * Local execution bypasses Redis where possible ### Integrations and Utilities * Config: `backend/app_evaluation_agent/utils/config.py` (TOML-based) * Virus scanning: `backend/app_evaluation_agent/integrations/virus_scanner.py` * Artifact storage: `backend/app_evaluation_agent/integrations/s3_client.py` * Real-time events: `backend/app_evaluation_agent/realtime.py` * Logging: `backend/app_evaluation_agent/logging_utils.py` ### Backend File Structure ``` backend/app_evaluation_agent/ ├── main.py ├── worker.py ├── logging_utils.py ├── logs/ ├── api/ │ └── v1/ │ ├── apps.py │ ├── evaluations.py │ ├── testplans.py │ ├── testcases.py │ ├── vision.py │ └── logs.py ├── services/ │ ├── apps.py │ ├── agents/ │ │ ├── planner.py │ │ ├── coordinator.py │ │ ├── analyzer.py │ │ ├── summarizer.py │ │ ├── bug_triage.py │ │ ├── llm_client.py │ │ └── prompt_loader.py │ ├── prompts/ │ │ ├── planner/ │ │ ├── bug_triage/ │ │ └── summarizer/ │ ├── evaluations.py │ ├── testcases.py │ └── vllm_coordinate_mapper.py ├── storage/ │ ├── models.py │ └── database.py ├── schemas/ │ ├── evaluation.py │ ├── testplan.py │ ├── testcase.py │ └── agent.py ├── integrations/ │ ├── virus_scanner.py │ └── s3_client.py └── utils/ └── config.py ``` ### Vision Pipeline Notes * Classical UI element detection is currently disabled * Vision endpoints accept a PNG screenshot and execution context * Returned model coordinates are preserved as raw values and remapped to screen pixels * This design supports future detector insertion and coordinate drift debugging ## Database Schema ```mermaid erDiagram APP { int id PK string name enum app_type datetime created_at datetime updated_at } APP_VERSION { int id PK int app_id FK int previous_version_id FK string version string artifact_uri string app_url datetime release_date text change_log datetime created_at datetime updated_at } APP_VERSION_LINEAGE { int app_version_id PK, FK int previous_version_id PK, FK } EVALUATION { int id PK int app_version_id FK enum status string execution_mode string assigned_executor_id json results string local_application_path string high_level_goal bool run_on_current_screen datetime created_at datetime updated_at } TEST_PLAN { int id PK int evaluation_id FK enum status json summary datetime created_at datetime updated_at } TEST_CASE { int id PK int plan_id FK int evaluation_id FK string name text description json input_data enum status json result int execution_order string assigned_executor_id datetime created_at datetime updated_at } BUG { int id PK int app_id FK string title text description enum severity_level int priority enum status int discovered_version_id FK string fingerprint json environment json reproduction_steps datetime first_seen_at datetime last_seen_at datetime created_at datetime updated_at } BUG_OCCURRENCE { int id PK int bug_id FK int evaluation_id FK int test_case_id FK int app_version_id FK int step_index json action text expected text actual json result_snapshot string screenshot_uri string log_uri json raw_model_coords datetime observed_at string executor_id datetime created_at datetime updated_at } BUG_FIX { int id PK int bug_id FK int fixed_in_version_id FK int verified_by_evaluation_id FK text note datetime created_at } APP ||--o{ APP_VERSION : has APP_VERSION ||--o{ EVALUATION : runs EVALUATION ||--o{ TEST_PLAN : owns EVALUATION ||--o{ TEST_CASE : owns TEST_PLAN ||--o{ TEST_CASE : contains APP_VERSION ||--o{ APP_VERSION_LINEAGE : has APP_VERSION_LINEAGE }o--|| APP_VERSION : previous APP ||--o{ BUG : owns APP_VERSION ||--o{ BUG : discovered_in BUG ||--o{ BUG_OCCURRENCE : observed EVALUATION ||--o{ BUG_OCCURRENCE : observed_in TEST_CASE ||--o{ BUG_OCCURRENCE : linked_to APP_VERSION ||--o{ BUG_OCCURRENCE : observed_on BUG ||--o{ BUG_FIX : fixed_in APP_VERSION ||--o{ BUG_FIX : fixed_on EVALUATION ||--o{ BUG_FIX : verified_by ``` ## API Reference The canonical API documentation lives at `docs/endpoints.md`. Use the interactive docs when running the backend: * `/docs` (Swagger UI) * `/redoc` Key operational endpoints: * `POST /api/v1/evaluations` (JSON) * `POST /api/v1/evaluations/upload` (desktop app upload) * `POST /api/v1/evaluations/url` (web app URL) * `POST /api/v1/evaluations/live` (use runner current screen) * `GET /api/v1/testplans/{plan_id}` * `GET /api/v1/testcases/next` * `PATCH /api/v1/testcases/{testcase_id}` * `POST /api/v1/bugs` * `GET /api/v1/bugs/{bug_id}` * `PATCH /api/v1/bugs/{bug_id}` * `DELETE /api/v1/bugs/{bug_id}` * `GET /api/v1/bugs/{bug_id}/occurrences` * `POST /api/v1/bugs/{bug_id}/occurrences` * `GET /api/v1/bugs/{bug_id}/fixes` * `POST /api/v1/bugs/{bug_id}/fixes` * `DELETE /api/v1/bugs/{bug_id}/fixes/{fix_id}` * `POST /api/v1/vision/analyze` * `GET /api/v1/logs/export` ## Backend Setup and Installation **Prerequisites** * Python 3.10+ * Poetry * Docker + Docker Compose **1) Clone** ```bash git clone https://github.com/Region-AI/EvalAgent cd Eval_Agent ``` **2) Install dependencies** ```bash cd backend poetry install ``` **3) Configure** ```bash cp config/settings.example.toml config/settings.toml ``` Edit `backend/config/settings.toml` to configure: * PostgreSQL URL * Redis host * LLM base URL and API key * Model paths (if applicable) **4) Start backend services** ```bash cp docker-compose.example.yaml docker-compose.yaml docker-compose up -d ``` **5) Run migrations** ```bash cp env.example.py alembic/env.py poetry run alembic upgrade head ``` **6) Run the backend** Terminal 1 (background worker): ```bash arq app_evaluation_agent.worker.WorkerSettings ``` Terminal 2 (API server): ```bash uvicorn app_evaluation_agent.main:app --reload ``` API docs are available at `http://127.0.0.1:8000/docs`. ## Testing **Backend** Most vision tests depend on a configured LLM endpoint and `screenshot.png` fixture: ```bash cd backend poetry run pytest tests/test_vllm_coordinate_mapper.py -q ``` **Desktop** ```bash cd desktop npx ts-node test/test-window-capture.ts ``` ## Docs Map * `docs/overview.md` * `docs/architecture.md` * `docs/backend.md` * `docs/desktop.md` * `docs/endpoints.md` * `docs/troubleshooting.md` * `ARCHITECTURE_backend.md` (legacy full backend architecture) * `ARCHITECTURE_frontend.md` (legacy full desktop architecture) * `README_backend.md` (legacy backend README) * `README_frontend.md` (legacy desktop README) ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=Region-AI/EvalAgent&type=Date)](https://star-history.com/#Region-AI/EvalAgent&Date) ## License Apache-2.0. See `LICENSE`.