# intercode **Repository Path**: xiaoyao-xdr/intercode ## Basic Information - **Project Name**: intercode - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-31 - **Last Updated**: 2025-03-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Just a chill picture..

### 👉 TL;DR Our research demonstrates that Large Language Models (LLMs) have quietly evolved into capable security testing tools, achieving an unprecedented **95% success rate** on the [InterCode-CTF](https://intercode-benchmark.github.io/) benchmark - a significant step forward from previous achievements of 29% and 72%. ### 🔍 Key Findings - **Hidden Potential:** Language models possess significantly stronger cybersecurity capabilities than previously documented in research - **Elicitation is hard:** While capabilities vary across models and tasks, success rates dramatically improve with the right experimental approach - **Simplicity Wins:** Standard tools and basic prompting techniques achieved superior results compared to more complex frameworks

Performance of different approaches on InterCode-CTF

### 🚀 Quick Start > **Prerequisites** > * `python` >= 3.8 > * `docker`: Learn more [here](https://docs.docker.com/get-docker/) how to install. Before running the below code, make sure the Docker daemon/application is running. 1. Clone this repository, create a virtual environment, and install necessary dependencies ```bash git clone https://github.com/PalisadeResearch/intercode.git cd intercode pip install -r requirements.txt ``` 2. Run `setup.sh` to create the docker images for the InterCode-CTF environment 3. Go to [`scripts`](scripts) folder and run any of the scripts to start the experiments. 4. To modify the experiment settings, you can do so in any of the scripts in the [`experiments`](experiments) folder. To adjust command execution timeout, you can do so in [`intercode/utils/utils.py`](intercode/utils/utils.py) You can `ctrl + c` at any time to exit the experiment. ### 🧪 Run Experiments To run the scripts in the [`experiments`](experiments) folder, make sure you have specified an OpenAI API key in a `keys.cfg` file formatted as follows + located in the root of this repository: ``` OPENAI_API_KEY: 'key here' ``` ### 📊 Calculate Statistics The [`solution_stats`](solution_stats) folder contains scripts to analyze experiment results: For single-attempt logs: ```bash # Analyze a single log file python solution_stats/count_single_attempts_file.py path/to/logfile.json # Analyze all log files in a folder python solution_stats/count_single_attempts_folder.py path/to/logs/folder ``` For multi-attempt logs: ```bash # Analyze a single log file python solution_stats/count_multi_attempts_file.py path/to/logfile.json # Analyze all log files in a folder python solution_stats/count_multi_attempts_folder.py path/to/logs/folder ``` These scripts will output: - List of solved tasks - Total number of solved tasks - Breakdown of solutions by category - List of unsolved tasks Note: The data from our paper can be found in: - [`solution_stats/results_section_logs`](solution_stats/results_section_logs) - [`solution_stats/ablationstudy_section_logs`](solution_stats/ablationstudy_section_logs) When running new experiments, logs will be saved to [`logs/experiments`](logs/experiments) and will only contain multi-attempt results. ### 📄 License MIT, see [`LICENSE.md`](LICENSE.md).