SWE-bench: Mirror of https://huggingface.co/datasets/princeton-nlp/SWE-bench

dataset_info

configs

features

splits

download_size

dataset_size

name	dtype
repo	string

name	dtype
instance_id	string

name	dtype
base_commit	string

name	dtype
patch	string

name	dtype
test_patch	string

name	dtype
problem_statement	string

name	dtype
hints_text	string

name	dtype
created_at	string

name	dtype
version	string

name	dtype
FAIL_TO_PASS	string

name	dtype
PASS_TO_PASS	string

name	dtype
environment_setup_commit	string

name	num_bytes	num_examples
dev	4783179	225

name	num_bytes	num_examples
test	44142926	2294

name	num_bytes	num_examples
train	367610377	19008

120092029

416536482

config_name

data_files

default

split	path
dev	data/dev-*

split	path
test	data/test-*

split	path
train	data/train-*

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the problem_statement (i.e. issue text) and the base_commit which can represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.

princeton-nlp/SWE-bench_oracle

princeton-nlp/SWE-bench_bm25_13K

princeton-nlp/SWE-bench_bm25_27K

princeton-nlp/SWE-bench_bm25_40K

princeton-nlp/SWE-bench_bm25_50k_llama

Supported Tasks and Leaderboards

SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com

Languages

The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.

Dataset Structure

Data Instances

An example of a SWE-bench datum is as follows:

instance_id: (str) - A formatted instance identifier, usually as repo_owner__repo_name-PR-number.
patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.
repo: (str) - The repository owner/name identifier from GitHub.
base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.
hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.
created_at: (str) - The creation date of the pull request.
test_patch: (str) - A test-file patch that was contributed by the solution PR.
problem_statement: (str) - The issue title and body.
version: (str) - Installation version to use for running evaluation.
environment_setup_commit: (str) - commit hash to use for environment setup and installation.
FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.
PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.

More Information needed

Hugging Face 数据集镜像/SWE-bench

Dataset Summary

Want to run inference now?

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

简介

发行版

贡献者 (2)

近期动态

Hugging Face 数据集镜像/SWE-bench .gitee-modal { width: 500px !important; }

Dataset Summary

Want to run inference now?

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

简介

发行版

贡献者 (2)

近期动态

搜索帮助

Hugging Face 数据集镜像/SWE-bench