# BinSight **Repository Path**: edge006/BinSight ## Basic Information - **Project Name**: BinSight - **Description**: Analyze the so file in APK through LLM+Capstone to determine the main intention of the so file and the developer (company) - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-02 - **Last Updated**: 2026-04-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # BinSight **BinSight** is an advanced APK analysis tool that dissects native libraries (`.so` files) and leverages the power of Large Language Models (LLMs)—augmented by **live web search results**—to determine their purpose, developer, and potential security implications. It automates the reverse engineering process of binary analysis, making it faster, more accurate, and more accessible. ## Features - **APK Extraction**: Automatically extracts all `.so` native libraries from a given Android APK. - **Live Web Search**: Performs a web search for each library to gather real-time, public information about its developer and purpose. - **Multi-Arch Disassembly**: Uses `pyelftools` and `capstone` to disassemble code for ARM, ARM64, x86, and x86-64 architectures. - **Rich Data Extraction**: Pulls not just assembly code, but also function names and embedded strings for a more context-rich analysis. - **Flexible LLM Integration**: Powered by `litellm`, it supports over 100 LLMs from various providers (OpenAI, Google, Anthropic, Cohere, and any OpenAI-compatible API). - **Configurable & Easy to Use**: Simple command-line interface allows you to specify the target APK, choose your LLM, and configure custom API endpoints and keys. ## Installation 1. **Clone the repository (optional):** If you have the project files, you can skip this step. ```bash git clone cd binsight-project ``` 2. **Install dependencies:** Ensure you have Python 3.6+ installed. Then, install the required packages from `requirements.txt`. ```bash pip install -r requirements.txt ``` ## Usage The script is run from the command line with several options to customize its behavior. ### Command Syntax ```bash python binsight.py [options] ``` ### Arguments - `target_path`: (Required) The path to a single `.apk` file or a directory containing multiple `.apk` files. - `--model`: The LLM model to use, in `litellm` format (e.g., `gemini/gemini-1.5-flash`). - `--api_key`: Your API key for the chosen provider. If not set, the tool will look for a corresponding environment variable (e.g., `GOOGLE_API_KEY`, `OPENAI_API_KEY`). - `--api_base`: The API base URL for non-standard providers like SiliconFlow or a self-hosted model. --- ### Examples #### Example 1: Standard Analysis with Gemini This is the simplest use case. It assumes you have your Google API key set in the environment. 1. **Set the environment variable:** ```bash export GOOGLE_API_KEY="your_google_api_key" ``` 2. **Run the analysis:** ```bash python binsight.py /path/to/your_app.apk --model "gemini/gemini-1.5-flash" ``` #### Example 2: Custom OpenAI-Compatible Provider (e.g., SiliconFlow) This example shows how to use an OpenAI-compatible endpoint, like SiliconFlow. Based on the official [LiteLLM Documentation](https://docs.litellm.ai/docs/providers/openai_compatible), you must prefix the model name with `openai/` to route the request correctly. ```bash python binsight.py /path/to/your_app.apk \ --model "openai/Qwen/Qwen3-235B-A22B" \ --api_base "https://api.siliconflow.cn/v1" \ --api_key "your_siliconflow_api_key" ``` *Note: The `openai/` prefix is required for `litellm` to use its standard OpenAI-compatible client.* ## How It Works 1. **Extract**: The input APK is unzipped, and all `.so` files are copied to a temporary location. 2. **Web Search**: For each `.so` file, BinSight performs a web search to find its likely purpose and developer. 3. **Disassemble**: The tool identifies the library's architecture, locates the executable `.text` section, and disassembles the machine code into human-readable assembly instructions. 4. **Analyze**: A detailed prompt containing the **web search results**, filename, assembly code, function names, and strings is sent to the configured LLM via `litellm`. 5. **Report**: The LLM's conclusion about each library's purpose and developer is collected and printed in a final summary report. 6. **Clean Up**: All temporary files are deleted. ## Supported Models This tool uses **`litellm`** to interact with language models. This means you can use any of the 100+ models supported by `litellm`. - To find the correct model identifier string, please refer to the official **[LiteLLM Provider List](https://docs.litellm.ai/docs/providers)**. ## Core Workflow 1. **Universal LLM Interface**: BinSight uses `litellm` as a unified gateway to over 100 LLM providers. This removes the need for provider-specific code and allows for seamless integration of new models. 2. **Dynamic Analysis Pipeline**: * **APK Deconstruction**: Extracts all unique `.so` libraries from the target APK. * **Metadata Extraction**: Uses `pyelftools` and `Capstone` to get assembly code, function names, and embedded strings from each library. * **Intelligent Analysis via LLM**: Sends this rich metadata package to the user-selected LLM. The prompt directs the model to act as a security expert, first identifying the library by name using its internal knowledge, then corroborating that with the provided binary evidence. 3. **Unified Results**: It presents a clear, concise summary of the likely purpose for each analyzed library. ## Setup 1. **Clone the repository.** 2. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 3. **Set API Keys (Environment Variables)**: `litellm` automatically finds API keys set as environment variables. Set the key for the provider you intend to use. ```bash # For OpenAI models (gpt-4o, gpt-4-turbo, etc.) export OPENAI_API_KEY="YOUR_OPENAI_KEY" # For Google models (gemini/gemini-1.5-pro, etc.) export GEMINI_API_KEY="YOUR_GEMINI_KEY" # For SiliconFlow models export SILICONFLOW_API_KEY="YOUR_SILICONFLOW_KEY" ``` ## Usage Run BinSight against a single APK file or an entire directory. The `--model` argument is now the central piece of the command. ### Model Selection You specify the model using the format recognized by `litellm`. Here are some common examples: * **Gemini**: `gemini/gemini-2.5-pro` * **SiliconFlow**: `openai/Qwen/Qwen3-32B` ### Command Examples ```bash # Analyze with OpenAI's GPT-4o (requires OPENAI_API_KEY) python binsight.py /path/to/app.apk --model gpt-4o --api_key "YOUR_KEY" # Analyze with Google's Gemini Pro (requires GEMINI_API_KEY) python binsight.py /path/to/app.apk --model gemini/gemini-2.5-flash --api_key "YOUR_KEY" # Analyze with SiliconFlow's Qwen/Qwen3-32B, providing the key directly python binsight.py /path/to/app.apk --model "openai/Qwen/Qwen3-32B" --api_base "https://api.siliconflow.cn/v1" --api_key "YOUR_KEY" ``` ### All Arguments * `input_path`: **Required**. Path to an APK file or a directory of APKs. * `--model`: **Required**. The model identifier for `litellm`. * `--api_key`: Optional. Provide the API key directly. Overrides environment variables. * `--api_base`: Optional. The API base URL for custom providers (e.g., SiliconFlow, local models). ## Demo Analysis Result Here is a sample output from analyzing an APK using `openai/Qwen/Qwen3-32B`. ```plaintext $ python binsight.py /path/to/some.apk --model openai/Qwen/Qwen3-32B --api_base "https://api.siliconflow.cn/v1" --api_key "sk-xxx" ================================================== Starting analysis for: some.apk ================================================== --- Comprehensive Analysis (Disassembly + LLM) --- [*] Processing: libflutter.so (from lib/arm64-v8a/libflutter.so) -> Analyzing with litellm (model: openai/Qwen/Qwen3-32B, attempt: 1/3)... [+] LLM analysis result: Intent: Google Flutter UI Framework | Confidence: High | Evidence: Known library name confirmed by numerous 'flutter::' and 'dart::' function names and strings like 'Flutter Engine'. ------------------------------------ Final Analysis Summary for some.apk ------------------------------------ Analysis complete! Found 1 SDKs or code intents: - Intent Analysis - libflutter.so Intent: Google Flutter UI Framework | Confidence: High | Evidence: Known library name confirmed by numerous 'flutter::' and 'dart::' function names and strings like 'Flutter Engine'.