language | pipeline_tag | tags | license | task_categories | size_categories | ||||
---|---|---|---|---|---|---|---|---|---|
|
text-generation |
|
apache-2.0 |
|
|
OpenCodeInterpreter is a family of open-source code generation systems designed to bridge the gap between large language models and advanced proprietary systems like the GPT-4 Code Interpreter. It significantly advances code generation capabilities by integrating execution and iterative refinement functionalities.
For further information and related work, refer to our paper: "OpenCodeInterpreter: A System for Enhanced Code Generation and Execution" available on arXiv.
CodeFeedback-Filtered-Instruction is a curated collection of code instruction queries extracted from four prominent open-source code instruction tuning datasets: Magicoder-OSS-Instruct, Python code subset of ShareGPT, Magicoder-Evol-Instruct, and Evol-Instruct-Code. Initially, 287k queries were aggregated from these datasets. To isolate the most intricate and informative instructions, a rigorous filtering process was employed. This involved utilizing the Qwen-72B-Chat, an open-source chat model, for selective filtering. The code queries are evaluated along with their corresponding responses within the compiled datasets by the LLM, assigning a complexity score ranging from 1 to 5, and only those rated 4 or 5 were retained for the seed set. This meticulous filtering process resulted in a final collection of 156k high-quality single-turn code instructions.
In subsequent processing steps mentioned in the paper, besides Single-turn Packing, we exclusively utilized queries without considering responses. However, here we retained all responses to provide users with more convenient usage options.
If you have any inquiries, please feel free to raise an issue or reach out to us via email at: xiangyue.work@gmail.com, zhengtianyu0428@gmail.com. We're here to assist you!
⚠️The dataset contains part data generated by OpenAI's language models, please pay attention to OpenAI's usage policy when adopting this dataset: https://openai.com/policies/usage-policies.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。