1 Star 0 Fork 0

Gitee 极速下载/OpenDeepSearch

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/sentient-agi/OpenDeepSearch
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Evaluation Scripts

This repository contains scripts for running evaluations and autograding on model outputs.

Available Commands

Autograde DataFrame Evaluation

To evaluate and autograde DataFrame outputs:

python -m evals.autograde_dataframe --csv_path <path_to_csv> --output_path <path_to_output_csv>

Example:

python evals/autograde_df.py output/fireworks_ai__accounts__fireworks__models__qwq-32b/codeact/simple_qa_test_set/fireworks_ai__accounts__fireworks__models__qwq-32b__codeact__simple_qa_test_set__trial1.jsonl

This command processes the specified JSONL file and performs automated grading on DataFrame outputs.

Run Task Evaluations

To run evaluations on a dataset with parallel processing:

python ./evals/eval_tasks.py --parallel-workers=8 --num-trials=1 --eval-tasks=./evals/datasets/frames_test_set.csv ./evals/datasets/simple_qa_test_set.csv

Parameters:

  • --date: Optional date for the evaluation
  • --eval-tasks: List of paths to CSV files containing evaluation tasks (default: ["./evals/datasets/frames_test_set.csv", "./evals/datasets/simple_qa_test_set.csv"])
  • --search-model-id: Model ID for the search tool (default: "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct")
  • --model-type: Type of model to use, either "LiteLLMModel" or "HfApiModel" (default: "LiteLLMModel")
  • --model-id: ID of the model to use (default: "fireworks_ai/accounts/fireworks/models/qwq-32b")
  • --agent-action-type: Type of agent action: "codeact", "tool-calling", or "vanilla" (default: "codeact")
  • --parallel-workers: Number of parallel workers to use (default: 8)
  • --num-trials: Number of evaluation trials to run (default: 1)

The results will be saved as a DataFrame in the evals directory.

Output

Evaluation results are stored in the following locations:

  • Task evaluation results: evals/ directory
  • DataFrame autograding results: Generated in the script's output
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/mirrors/OpenDeepSearch.git
git@gitee.com:mirrors/OpenDeepSearch.git
mirrors
OpenDeepSearch
OpenDeepSearch
main

搜索帮助