# MixEval **Repository Path**: mirrors_huggingface/MixEval ## Basic Information - **Project Name**: MixEval - **Description**: The official evaluation suite and dynamic data release for MixEval. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-08 - **Last Updated**: 2026-04-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

🏠 Homepage | 🏆 Leaderboard | 📜 arXiv | 🤗 HF Dataset | 🤗 HF Paper | 𝕏 Twitter

# MixEval (Fork) [MixEval](https://github.com/Psycoy/MixEval/) is a A dynamic benchmark evaluating LLMs using real-world user queries and benchmarks, achieving a 0.96 model ranking correlation with Chatbot Arena and costs around $0.6 to run using GPT-3.5 as a Judge. You can find more information and access the MixEval leaderboard [here](https://mixeval.github.io/#leaderboard). This is a fork of the original MixEval repository. The original repository can be found [here](https://github.com/Psycoy/MixEval/). I created this fork to make the integration and use of MixEval easier during the training of new models. This Fork includes several improved feature to make usages easier and more flexible. Including: * Evaluation of Local Models during or post trainig with `transformers` * Hugging Face Datasets integration to avoid the need of local files. * Use of Hugging Face TGI or vLLM to accelerate evaluation and making it more manageable * Improved markdown outputs and timing for the training * Fixed pip install for remote or CI Integration. ## Getting started ```bash # Fork with more losely dependencies pip install git+https://github.com/philschmid/MixEval --upgrade ``` _Note: If you want to evaluate models that are not included Take a look [here](https://github.com/philschmid/MixEval?tab=readme-ov-file#registering-new-models). Zephyr example [here](https://github.com/philschmid/MixEval/blob/main/mix_eval/models/zephyr_7b_beta.py)._ ## Evaluation open LLMs **Remote Hugging Face model with existing config:** ```bash # MODEL_PARSER_API= MODEL_PARSER_API=$(echo $OPENAI_API_KEY) python -m mix_eval.evaluate \ --data_path hf://zeitgeist-ai/mixeval \ --model_path my/local/path \ --output_dir results/agi-5 \ --model_name local_chat \ --benchmark mixeval_hard \ --version 2024-06-01 \ --batch_size 20 \ --api_parallel_num 20 ``` **Remote Hugging Face model without config and defaults** _Note: We use the model name `local_chat` to avoid the need for a config file and load it from the Hugging Face model hub._ ```bash # MODEL_PARSER_API= MODEL_PARSER_API=$(echo $OPENAI_API_KEY) python -m mix_eval.evaluate \ --data_path hf://zeitgeist-ai/mixeval \ --model_path alignment-handbook/zephyr-7b-sft-full \ --output_dir results/handbook-zephyr \ --model_name local_chat \ --benchmark mixeval_hard \ --version 2024-06-01 \ --batch_size 20 \ --api_parallel_num 20 ```