# FirmAgent **Repository Path**: wellstudy0806/FirmAgent ## Basic Information - **Project Name**: FirmAgent - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-24 - **Last Updated**: 2026-05-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # FirmAgent **⚠️ Legal/Ethical Notice** Only test targets you own or have explicit written permission to assess. You are solely responsible for compliance with local laws, licenses, and organizational policies. ------ ## Table of Contents - [Overview](#overview) - [Prerequisites](#prerequisites) - [Quick Start](#quick-start) - [Workflow](#workflow) - [1) Pre-fuzzing Analysis](#1-pre-fuzzing-analysis) - [2) Runtime Monitoring](#2-runtime-monitoring) - [3) Fuzzing](#3-fuzzing) - [4) Taint-to-PoC Agent](#4-taint-to-poc-agent) - [Input/Output Schemas](#inputoutput-schemas) - [Pre_fuzzing.json](#pre_fuzzingjson-api-definitions) - [Fuzzing output files](#fuzzing-output-files) - [Configuration Reference](#configuration-reference) - [Tips & Best Practices](#tips--best-practices) - [Troubleshooting](#troubleshooting) - [FAQ](#faq) ------ ## Overview ![alt text](Pre_Fuzzing/image.png) This toolkit automates an end-to-end workflow for discovering and validating vulnerabilities in re-hosted IoT firmware: 1. **Fuzzing-Driven Information Collection** - Extract all target URIs and input parameters. - Locate sink functions and compute distances from basic blocks to sinks. - Produce mutation dictionaries for the fuzzer. - Instrumented QEMU binaries collect control-flow data during fuzzing. - Optional: Build emulation images directly with a helper script. - Generic, device-aware API fuzzer with configurable `Host` header. - Uses the pre-fuzzing artifacts to generate requests and mutations. 2. **Taint-to-PoC Agent** - LLM-assisted taint analysis to findings vulnerabilities and produce PoCs. ------ ## Prerequisites - **Python**: 3.8+ Install Python dependencies: ``` pip install -r requirements.txt ``` - **Emulation**: Greenhouse (Details can be found in https://github.com/sefcom/greenhouse.git) - We ship instrumented binaries under `FirmAgent/FuzzingRecord/gh3fuzz/fuzz_bins/qemu/`. - Control-flow collector: `libibresolver.so`. - **Target firmware**: Obtain and re-host the firmware image - **Setting Environment Variables:** export IDAT_BIN=/path/to/idat64 export Private_API_KEY={API_KEY} ------ ## Quick Start 1. **Run pre-fuzzing** (single command): Simply run the automated pipeline script: ``` ./Pre_Fuzzing/run.sh ``` This single command will: - Export decompiled code and strings from IDA Pro - Extract API endpoints and parameters using LLM - Identify sink function address ranges - Compute distances from basic blocks to sinks **Outputs** (in the same directory as ``): - `Pre_fuzzing.json` - Complete fuzzing input schema with APIs and parameters - `sink_scope_addr.txt` - Sink function address ranges - `sink_distance_scores.json` / `sink_distance_scores.csv` - Distance scores for prioritization - `export-for-ai-/` - Decompiled evidence directory 2. **Build an emulated image (optional, automated wiring)**: ``` python build_fuzz_img.py ``` This integrates the instrumentation and supporting binaries into the emulation image. 2. **Start fuzzing** (from the host): **⚠️ Important:** Before running the fuzzer, you need to customize the request templates in `FuzzingRecord/Fuzzer.py` to match your target device's API format (e.g., request headers, authentication, payload structure). Also set the runtime filter and taint seed before fuzzing: ``` export QEMU_DFILTER="Sink scope" (docker) ``` Use `TAINTTAG` as the one-shot parameter seed value for request parameters. ``` python FuzzingRecord/Fuzzer.py \ --json-file Pre_fuzzing.json \ --delay 0.5 \ --host {target_ip_or_domain} ``` **Fuzzing outputs:** - `result.json` - Request packets and responses produced by `FuzzingRecord/Fuzzer.py` - `fuzzing_results.log` - Detailed fuzzing logs - `Source.json` / `source.json` - Runtime-monitoring artifact containing source addresses and reachable test cases; this file is consumed by `LLMATaint.py` but is not generated by the current `FuzzingRecord/Fuzzer.py` alone 3. **Run LLM-assisted taint analysis to generate PoCs**: Place the runtime-monitoring artifact (`Source.json` or `source.json`) in the **same folder as the target binary**, then run: ``` python LLMATaint.py \ -b {path_to_binary} \ -p {True|False} \ -t {vuln_type} \ -o {path_to_resultfolder} \ -m {model} ``` Supported model values currently used by the code include: `R1_official`, `V3_official` for the current `LLMAPIThree` path used by `LLMATaint.py` The taint analysis agent will: - Analyze the taint flow from source points identified during fuzzing - Use the reachable test cases as base request packets during later validation / PoC generation - First determine whether an alert exists, then validate it and generate concrete PoCs ------ ## Workflow ### 1) Pre-fuzzing Analysis **Goal:** Collect everything needed to drive effective mutations and triage. The pre-fuzzing pipeline (`Pre_Fuzzing/run.sh`) automates the following steps: 1. **Decompilation export** (`Pre_Fuzzing/decompile.py`) - Exports decompiled functions, strings, memory dumps, imports/exports - Creates `export-for-ai-/` directory with all evidence 2. **API & parameter extraction** (`Pre_Fuzzing/llm_extract_api_params.py`) - Uses LLM to analyze decompiled code and extract: - API endpoints (goform handlers, URL routes, etc.) - Parameter names and types from function callsites - Generates `Pre_fuzzing.json` with complete fuzzing schema 3. **Sink scope extraction** (`Pre_Fuzzing/Get_SinkFunc.py`) - Identifies dangerous sink functions (system calls, strcpy, etc.) - Traces backward from sinks to find all reachable functions - Outputs `sink_scope_addr.txt` with address ranges 4. **Distance calculation** (`Pre_Fuzzing/Distance.py`) - Computes shortest path distances from basic blocks to sinks - Prioritizes fuzzing targets closer to dangerous sinks - Outputs `sink_distance_scores.json` / `sink_distance_scores.csv` **Outputs:** - `Pre_fuzzing.json` - Complete fuzzing input schema (APIs + parameters) - `sink_scope_addr.txt` - Sink function address ranges - `sink_distance_scores.json` / `sink_distance_scores.csv` - Distance scores for prioritization ### 2) Runtime Monitoring We provide an instrumented QEMU stack and a control-flow recording library: - **Location:** `FirmAgent/FuzzingRecord/gh3fuzz/fuzz_bins/qemu/` - `libibresolver.so` collects control-flow events. - **Recommended path:** - Use `build_fuzz_img.py` to assemble the emulation image. - Instrumentation and collectors are automatically integrated. - **Customization:** You can extend `qemuafl` to log additional runtime signals. We provide our **Fuzzing-SA** source for reference (see repo). ### 3) Fuzzing **⚠️ Before running:** Customize `FuzzingRecord/Fuzzer.py` to match your target device's API format: - Update request headers (authentication tokens, cookies, etc.) - Adjust payload structure if your device uses non-standard formats - Configure device-specific error detection patterns - Export `QEMU_DFILTER="Sink scope"` before fuzzing - Use `TAINTTAG` as the one-shot parameter seed value Run the generic fuzzer against your re-hosted device: ``` python FuzzingRecord/Fuzzer.py \ --json-file Pre_fuzzing.json \ --delay 0.5 \ --host {target_ip_or_domain} ``` - `--json-file` — Pre-fuzzing output containing API definitions. - `--delay` — Inter-request sleep to avoid rate-limits / WAFs. - `--host` — Value for the `Host` header (supports SNI / vhosts). **What happens:** - The fuzzer parses `Pre_fuzzing.json` and parameter templates. - For each endpoint & parameter, it generates mutations. - Requests and responses are logged to `fuzzing_results.log`. - Structured request/response packets are written to `result.json` next to the input JSON. - Potential vulnerabilities are flagged based on error codes, timing, and content indicators. - Source-address artifacts used by `LLMATaint.py` must be supplied separately by the runtime monitoring / instrumentation component as `Source.json` or `source.json`. **Fuzzing outputs:** - `result.json` - Structured request packets and responses written by `FuzzingRecord/Fuzzer.py` - `Source.json` / `source.json` - Source addresses and their corresponding reachable test cases (required for taint analysis; produced by runtime monitoring rather than by the current fuzzer script alone) - `fuzzing_results.log` - Human-readable progress and warnings ### 4) Taint-to-PoC Agent Once fuzzing identifies source points and generates test cases, use LLM-assisted taint analysis to validate and transform findings into concrete PoCs: ``` python LLMATaint.py \ -b {path_to_binary} \ -p {True|False} \ -t {vuln_type} \ -o {path_to_resultfolder} \ -m {model} ``` Supported model values currently used by the code include `R1_official` and `V3_official` for the default `LLM_analysis()` path. **Prerequisites:** - Place `Source.json` or `source.json` (contains source addresses and reachable test cases) in the **same directory** as the target binary - If available, place `Indirect_call.json` or `indirect_data.json` (caller-callee pairs) in the same directory **What the agent does:** - Analyzes taint flow from source points identified during fuzzing - Uses reachable test cases from `Source.json` / `source.json` as base packets during later PoC generation - Traces data flow through decompiled code to identify vulnerable paths - Uses a two-stage process: first taint/alert judgment, then validation + PoC generation - Produces validated exploit code with proper input formatting ------ ## Input/Output Schemas ### Pre_fuzzing.json (API definitions) Preferred format used by the current `FuzzingRecord/Fuzzer.py`: ``` { "api_endpoints": [ "/nitro/v1/config/example_endpoint", "/apply.cgi" ], "para": [ "username", "cmd", "action" ] } ``` Legacy array-of-objects input is still partially supported for endpoint extraction, but the current fuzzer primarily expects: - `api_endpoints`: Relative API paths. - `para`: Parameter names used to build per-parameter taint payloads. ### Fuzzing output files - **`fuzzing_results.log`** — Human-readable progress and warnings. - **`result.json`** — Structured request packets and responses emitted by `FuzzingRecord/Fuzzer.py`. - **`Source.json` / `source.json`** — Source addresses and their corresponding reachable test cases (required for taint analysis; produced by runtime monitoring / instrumentation). - **`Indirect_call.json`** / **`indirect_data.json`** (optional) — Caller-callee pairs for indirect call resolution. ### `Source.json` format and generation `LLMATaint.py` supports both simple address lists and structured source entries. You can generate either format from runtime taint logs using: `FuzzingRecord/build_source_json.py` 1) Address-list format (minimal): ``` [ "0x401000", "0x4023a4", "0x403000" ] ``` Generate: ``` python FuzzingRecord/build_source_json.py \ --qemu-log /scratch/output/qemu_taint.log \ --format address-list \ --output /scratch/output/Source.json ``` 2) Structured format with reachable testcases (recommended): ``` { "sources": [ { "address": "0x401000", "reachable_testcase": { "api_url": "/apply.cgi", "method": "POST", "post_payload": { "username": "TAINTTAG", "action": "login" } } } ] } ``` Generate (attach testcase info from `result.json`): ``` python FuzzingRecord/build_source_json.py \ --qemu-log /scratch/output/qemu_taint.log \ --format sources \ --result-json /scratch/output/result.json \ --taint-tag TAINTTAG \ --output /scratch/output/Source.json ``` Notes: - `--qemu-log` should point to the log file generated by your instrumented `afl-qemu-trace` run. - If both `Source.json` and `source.json` exist, explicitly set `--dynamic-source-file` when running `LLMATaint.py`. ------ ### Example Commands Recap **Run pre-fuzzing (single command):** ``` ./Pre_Fuzzing/run.sh /path/to/httpd ``` This generates `Pre_fuzzing.json`, `sink_scope_addr.txt`, and distance scores in the same directory as the binary. **Run fuzzing (host):** ⚠️ Remember to customize `FuzzingRecord/Fuzzer.py` request templates first! ``` export QEMU_DFILTER="Sink scope" (docker) python FuzzingRecord/Fuzzer.py \ --json-file Pre_fuzzing.json \ --delay 0.5 \ --host 192.168.0.1 ``` This generates `result.json` (request packets and responses). A separate runtime-monitoring step should generate `Source.json` / `source.json` for taint analysis. **Run taint-to-PoC agent:** Place `Source.json` or `source.json` in the same directory as the binary, then: ``` python LLMATaint.py \ -b ./bin/httpd \ -p False \ -t ci \ -o ./results/ASUS \ -m R1_official ``` **Reference** If you use or cite this work, please reference: ```bibtex @inproceedings{Ji2026FirmAgent, author = {Ji, Jiangan and Zhang, Chao and Gan, Shuitao and Lin, Jian and Liu, Hangtian and Liu, Tieming and Zheng, Lei and Jia, Zhipeng}, title = {FirmAgent: Leveraging Fuzzing to Assist LLM Agents with IoT Firmware Vulnerability Discovery}, booktitle = {Network and Distributed System Security (NDSS) Symposium}, year = {2026}, month = {February}, pages = {1--16}, address = {San Diego, CA, USA}, doi = {10.14722/ndss.2026.231943}, isbn = {979-8-9919276-8-0}, url = {https://dx.doi.org/10.14722/ndss.2026.231943}, } ```