Provide high-performance model inference, mainly supporting the CodeFuse model from Ant Group.
A DevOps Domain Knowledge Evaluation Benchmark for Large Language Models
本项目是一个开源的 AI 智能助手,专为软件开发的全生命周期而设计,涵盖设计、编码、测试、部署和运维等阶段。
CodeFuseEval is a Code Generation benchmark that combines the multi-tasking scenarios of CodeFuse Model with the benchmarks of HumanEval-x and MBPP.
The mission of CodeFuse is to develop Code Large Language Models (Code LLMs) specifically designed to support the entire software development lifecycle, covering crucial stages such as design, requirements, coding, testing, deployment, operations, and maintenance. We are passionate about creating innovative solutions that empower developers throughout the software development process.
In this release, we are open sourcing
The resulting model ensemble, which includes CodeFuse-13B (ModelScope Repo)and CodeFuse-CodeLlama-34B(ModelScope Repo), supports various code-related tasks such as code completion, text-to-code conversion, and unit test generation. In particular, CodeFuse-CodeLlama-34B, built upon CodeLlama as the base model and fine-tuned using the proposed MFT framework, achieves an impressive score of 74.4% (greedy decoding) in the HumanEval Python pass@1 evaluation, even surpassing the performance of GPT-4 (67%). We have plans to incorporate additional base LLMs into the ensemble in the near future.
We believe that our solution can significantly enhance the performance of pretrained LLMs across multiple related tasks simultaneously. We are committed to further exploring this direction and providing more open-source contributions. We also encourage engineers and researchers within this community to join us in co-constructing CodeFuse.