1 Star 0 Fork 0

Hugging Face 数据集镜像/CodeFeedback-Filtered-Instruction

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
language pipeline_tag tags license task_categories size_categories
en
text-generation
code
apache-2.0
question-answering
10K

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

OpenCodeInterpreter

[🏠Homepage] | [🛠️Code]


OpenCodeInterpreter

OpenCodeInterpreter is a family of open-source code generation systems designed to bridge the gap between large language models and advanced proprietary systems like the GPT-4 Code Interpreter. It significantly advances code generation capabilities by integrating execution and iterative refinement functionalities.

For further information and related work, refer to our paper: "OpenCodeInterpreter: A System for Enhanced Code Generation and Execution" available on arXiv.

Dataset Description

CodeFeedback-Filtered-Instruction is a curated collection of code instruction queries extracted from four prominent open-source code instruction tuning datasets: Magicoder-OSS-Instruct, Python code subset of ShareGPT, Magicoder-Evol-Instruct, and Evol-Instruct-Code. Initially, 287k queries were aggregated from these datasets. To isolate the most intricate and informative instructions, a rigorous filtering process was employed. This involved utilizing the Qwen-72B-Chat, an open-source chat model, for selective filtering. The code queries are evaluated along with their corresponding responses within the compiled datasets by the LLM, assigning a complexity score ranging from 1 to 5, and only those rated 4 or 5 were retained for the seed set. This meticulous filtering process resulted in a final collection of 156k high-quality single-turn code instructions.

In subsequent processing steps mentioned in the paper, besides Single-turn Packing, we exclusively utilized queries without considering responses. However, here we retained all responses to provide users with more convenient usage options.

Contact

If you have any inquiries, please feel free to raise an issue or reach out to us via email at: xiangyue.work@gmail.com, zhengtianyu0428@gmail.com. We're here to assist you!

⚠️The dataset contains part data generated by OpenAI's language models, please pay attention to OpenAI's usage policy when adopting this dataset: https://openai.com/policies/usage-policies.

空文件

简介

Mirror of https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hf-datasets/CodeFeedback-Filtered-Instruction.git
git@gitee.com:hf-datasets/CodeFeedback-Filtered-Instruction.git
hf-datasets
CodeFeedback-Filtered-Instruction
CodeFeedback-Filtered-Instruction
main

搜索帮助