1 Star 0 Fork 0

limit/TensorRT-Model-Optimizer

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Evaluation scripts for LLM tasks

This folder includes popular 3rd-party LLM benchmarks for LLM accuracy evaluation.

The following instructions show how to evaluate the Model Optimizer quantized LLM with the benchmarks, including the TensorRT-LLM deployment.

MMLU

Massive Multitask Language Understanding. A score (0-1, higher is better) will be printed at the end of the benchmark.

Setup

Download data

mkdir -p data
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
cd ..

Baseline

python mmlu.py --model_name causal --model_path <HF model folder or model card>

Quantized (simulated)

# MODELOPT_QUANT_CFG: Choose from [INT8_SMOOTHQUANT_CFG|FP8_DEFAULT_CFG|INT4_AWQ_CFG|W4A8_AWQ_BETA_CFG]
python mmlu.py --model_name causal --model_path <HF model folder or model card> --quant_cfg MODELOPT_QUANT_CFG

Evaluate the TensorRT-LLM engine

python mmlu.py --model_name causal --model_path <HF model folder or model card> --engine_dir <built TensorRT-LLM folder>

Human-eval

HumanEval. A score (0-1, higher is better) will be printed at the end of the benchmark.

Due to various prompt and generation postprocessing methods, the final score might be different compared with the published numbers from the model developer.

Setup

Clone Instruct-eval and add a softlink to folder human_eval from instruct_eval/

Baseline

python humaneval.py --model_name causal --model_path <HF model folder or model card> --n_sample 1

Quantized (simulated)

# MODELOPT_QUANT_CFG: Choose from [INT8_SMOOTHQUANT_CFG|FP8_DEFAULT_CFG|INT4_AWQ_CFG|W4A8_AWQ_BETA_CFG]
python humaneval.py --model_name causal --model_path <HF model folder or model card> --n_sample 1 --quant_cfg MODELOPT_QUANT_CFG

Evaluate the TRT-LLM engine

python humaneval.py --model_name causal --model_path <HF model folder or model card> --engine_dir <built TensorRT-LLM folder> --n_sample 1

MT-Bench

MT-Bench. These reponses are generated using FastChat.

Baseline

bash run_fastchat.sh <HF model folder or model card>

Evaluate the TensorRT-LLM engine

bash run_fastchat.sh <HF model folder or model card> <built TensorRT-LLM folder>

Judging the responses

The responses to questions from MT Bench will be stored under data/mt_bench/model_answer. The quality of the responses can be judged using llm_judge from the FastChat repository. Please refer to the llm_judge to compute the final MT-Bench score.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/limitmhw/TensorRT-Model-Optimizer.git
git@gitee.com:limitmhw/TensorRT-Model-Optimizer.git
limitmhw
TensorRT-Model-Optimizer
TensorRT-Model-Optimizer
main

搜索帮助