# intercode
**Repository Path**: xiaoyao-xdr/intercode
## Basic Information
- **Project Name**: intercode
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-31
- **Last Updated**: 2025-03-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
### ๐ TL;DR
Our research demonstrates that Large Language Models (LLMs) have quietly evolved into capable security testing tools, achieving an unprecedented **95% success rate** on the [InterCode-CTF](https://intercode-benchmark.github.io/) benchmark - a significant step forward from previous achievements of 29% and 72%.
### ๐ Key Findings
- **Hidden Potential:** Language models possess significantly stronger cybersecurity capabilities than previously documented in research
- **Elicitation is hard:** While capabilities vary across models and tasks, success rates dramatically improve with the right experimental approach
- **Simplicity Wins:** Standard tools and basic prompting techniques achieved superior results compared to more complex frameworks
### ๐ Quick Start
> **Prerequisites**
> * `python` >= 3.8
> * `docker`: Learn more [here](https://docs.docker.com/get-docker/) how to install. Before running the below code, make sure the Docker daemon/application is running.
1. Clone this repository, create a virtual environment, and install necessary dependencies
```bash
git clone https://github.com/PalisadeResearch/intercode.git
cd intercode
pip install -r requirements.txt
```
2. Run `setup.sh` to create the docker images for the InterCode-CTF environment
3. Go to [`scripts`](scripts) folder and run any of the scripts to start the experiments.
4. To modify the experiment settings, you can do so in any of the scripts in the [`experiments`](experiments) folder. To adjust command execution timeout, you can do so in [`intercode/utils/utils.py`](intercode/utils/utils.py)
You can `ctrl + c` at any time to exit the experiment.
### ๐งช Run Experiments
To run the scripts in the [`experiments`](experiments) folder, make sure you have specified an OpenAI API key in a `keys.cfg` file formatted as follows + located in the root of this repository:
```
OPENAI_API_KEY: 'key here'
```
### ๐ Calculate Statistics
The [`solution_stats`](solution_stats) folder contains scripts to analyze experiment results:
For single-attempt logs:
```bash
# Analyze a single log file
python solution_stats/count_single_attempts_file.py path/to/logfile.json
# Analyze all log files in a folder
python solution_stats/count_single_attempts_folder.py path/to/logs/folder
```
For multi-attempt logs:
```bash
# Analyze a single log file
python solution_stats/count_multi_attempts_file.py path/to/logfile.json
# Analyze all log files in a folder
python solution_stats/count_multi_attempts_folder.py path/to/logs/folder
```
These scripts will output:
- List of solved tasks
- Total number of solved tasks
- Breakdown of solutions by category
- List of unsolved tasks
Note: The data from our paper can be found in:
- [`solution_stats/results_section_logs`](solution_stats/results_section_logs)
- [`solution_stats/ablationstudy_section_logs`](solution_stats/ablationstudy_section_logs)
When running new experiments, logs will be saved to [`logs/experiments`](logs/experiments) and will only contain multi-attempt results.
### ๐ License
MIT, see [`LICENSE.md`](LICENSE.md).