105 Star 1.4K Fork 977

GVPMindSpore/mindformers
关闭

加入 Gitee
与超过 1400万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
alpaca_converter.py 1.74 KB
一键复制 编辑 原始数据 按行查看 历史
zhouxq 提交于 2024-09-05 15:48 +08:00 . MindFormers静态问题修复
"""
fastchat stanford alpaca data convert tools.
"""
import argparse
import json
import os
import pathlib
def main(data_path, output_path):
data_path = pathlib.Path(data_path)
with data_path.open() as f:
data = json.load(f)
sources = []
for example in data:
if example.get("input", "") == "":
sources.append(example['instruction'])
else:
instruction = example['instruction']
if instruction[-1] == ".":
instruction = instruction[:-1]
instruction = instruction + ": " + example['input']
sources.append(instruction)
targets = []
for example in data:
targets.append(example['output'])
new_data = []
for s, t in zip(sources, targets):
new_data.append({
"type": "chatml",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant.",
},
{
"role": "user",
"content": s,
},
{
"role": "assistant",
"content": t,
},
]
})
flags_ = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
with os.fdopen(os.open(output_path, flags_, 0o750), 'w', encoding='utf-8') as f:
for sample in new_data:
f.write(json.dumps(sample) + '\n')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--data_path", type=str, default="alpaca-data.json")
parser.add_argument(
"--output_path", type=str, default="alpaca-data-conversation.json"
)
args = parser.parse_args()
main(args.data_path, args.output_path)
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mindspore/mindformers.git
git@gitee.com:mindspore/mindformers.git
mindspore
mindformers
mindformers
r1.5.0

搜索帮助