# nlg-eval **Repository Path**: ma-lechi/nlg-eval ## Basic Information - **Project Name**: nlg-eval - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2021-07-25 - **Last Updated**: 2024-08-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Build Status](https://travis-ci.org/Maluuba/nlg-eval.svg?branch=master)](https://travis-ci.org/Maluuba/nlg-eval) # nlg-eval Evaluation code for various unsupervised automated metrics for NLG (Natural Language Generation). It takes as input a hypothesis file, and one or more references files and outputs values of metrics. Rows across these files should correspond to the same example. ## Metrics ## - BLEU - METEOR - ROUGE - CIDEr - SkipThought cosine similarity - Embedding Average cosine similarity - Vector Extrema cosine similarity - Greedy Matching score ## Setup ## Install Java 1.8.0 (or higher). Install the Python dependencies, run: ```bash pip install git+https://github.com/Maluuba/nlg-eval.git@master ``` If you are using macOS High Sierra or higher, then run this to allow multithreading: ```bash export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES ``` Simple setup (download required data (e.g. models, embeddings) and external code files), run: ```bash nlg-eval --setup ``` If you're setting this up from the source code or you're on Windows and not using a Bash terminal, then you might get errors about `nlg-eval` not being found. You will need to find the `nlg-eval` script. See [here](https://github.com/Maluuba/nlg-eval/issues/61) for details. ### Custom Setup ### ```bash # If you don't like the default path (~/.cache/nlgeval) for the downloaded data, # then specify a path where you want the files to be downloaded. # The value for the data path is stored in ~/.config/nlgeval/rc.json and can be overwritten by # setting the NLGEVAL_DATA environment variable. nlg-eval --setup ${data_path} ``` ### Validate the Setup (Optional) ### (These examples were made with Git Bash on Windows) All of the data files should have been downloaded, you should see sizes like: ``` $ ls -l ~/.cache/nlgeval/ total 6003048 -rw-r--r-- 1 ... 289340074 Sep 12 2018 bi_skip.npz -rw-r--r-- 1 ... 689 Sep 12 2018 bi_skip.npz.pkl -rw-r--r-- 1 ... 2342138474 Sep 12 2018 btable.npy -rw-r--r-- 1 ... 7996547 Sep 12 2018 dictionary.txt -rw-r--r-- 1 ... 21494787 Jan 22 2019 glove.6B.300d.model.bin -rw-r--r-- 1 ... 480000128 Jan 22 2019 glove.6B.300d.model.bin.vectors.npy -rw-r--r-- 1 ... 663989216 Sep 12 2018 uni_skip.npz -rw-r--r-- 1 ... 693 Sep 12 2018 uni_skip.npz.pkl -rw-r--r-- 1 ... 2342138474 Sep 12 2018 utable.npy ``` You can also verify some checksums: ``` $ cd ~/.cache/nlgeval/ $ md5sum * 9a15429d694a0e035f9ee1efcb1406f3 *bi_skip.npz c9b86840e1dedb05837735d8bf94cee2 *bi_skip.npz.pkl 022b5b15f53a84c785e3153a2c383df6 *btable.npy 26d8a3e6458500013723b380a4b4b55e *dictionary.txt f561ab0b379e23cbf827a054f0e7c28e *glove.6B.300d.model.bin be5553e91156471fe35a46f7dcdfc44e *glove.6B.300d.model.bin.vectors.npy 8eb7c6948001740c3111d71a2fa446c1 *uni_skip.npz e1a0ead377877ff3ea5388bb11cfe8d7 *uni_skip.npz.pkl 5871cc62fc01b79788c79c219b175617 *utable.npy $ sha256sum * 8ab7965d2db5d146a907956d103badfa723b57e0acffb75e10198ba9f124edb0 *bi_skip.npz d7e81430fcdcbc60b36b92b3f879200919c75d3015505ee76ae3b206634a0eb6 *bi_skip.npz.pkl 4a4ed9d7560bb87f91f241739a8f80d8f2ba787a871da96e1119e913ccd61c53 *btable.npy 4dc5622978a30cddea8c975c871ea8b6382423efb107d27248ed7b6cfa490c7c *dictionary.txt 10c731626e1874effc4b1a08d156482aa602f7f2ca971ae2a2f2cd5d70998397 *glove.6B.300d.model.bin 20dfb1f44719e2d934bfee5d39a6ffb4f248bae2a00a0d59f953ab7d0a39c879 *glove.6B.300d.model.bin.vectors.npy 7f40ff16ff5c54ce9b02bd1a3eb24db3e6adaf7712a7a714f160af3a158899c8 *uni_skip.npz d58740d46cba28417cbc026af577f530c603d81ac9de43ffd098f207c7dc4411 *uni_skip.npz.pkl 790951d4b08e843e3bca0563570f4134ffd17b6bd4ab8d237d2e5ae15e4febb3 *utable.npy ``` If you're ensure that the setup was successful, you can run the tests: ```bash pip install pytest pytest ``` It might take a few minutes and you might see warnings but they should pass. ## Usage ## Once setup has completed, the metrics can be evaluated with a Python API or in the command line. Examples of the Python API can be found in [test_nlgeval.py](nlgeval/tests/test_nlgeval.py). ### Standalone ### nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt where each line in the hypothesis file is a generated sentence and the corresponding lines across the reference files are ground truth reference sentences for the corresponding hypothesis. ### functional API: for the entire corpus ### ```python from nlgeval import compute_metrics metrics_dict = compute_metrics(hypothesis='examples/hyp.txt', references=['examples/ref1.txt', 'examples/ref2.txt']) ``` ### functional API: for only one sentence ### ```python from nlgeval import compute_individual_metrics metrics_dict = compute_individual_metrics(references, hypothesis) ``` where `references` is a list of ground truth reference text strings and `hypothesis` is the hypothesis text string. ### object oriented API for repeated calls in a script - single example ### ```python from nlgeval import NLGEval nlgeval = NLGEval() # loads the models metrics_dict = nlgeval.compute_individual_metrics(references, hypothesis) ``` where `references` is a list of ground truth reference text strings and `hypothesis` is the hypothesis text string. ### object oriented API for repeated calls in a script - multiple examples ### ```python from nlgeval import NLGEval nlgeval = NLGEval() # loads the models metrics_dict = nlgeval.compute_metrics(references, hypothesis) ``` where `references` is a list of lists of ground truth reference text strings and `hypothesis` is a list of hypothesis text strings. Each inner list in `references` is one set of references for the hypothesis (a list of single reference strings for each sentence in `hypothesis` in the same order). ## Reference ## If you use this code as part of any published research, please cite the following paper: Shikhar Sharma, Layla El Asri, Hannes Schulz, and Jeremie Zumer. **"Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation"** *arXiv preprint arXiv:1706.09799* (2017) ```bibtex @article{sharma2017nlgeval, author = {Sharma, Shikhar and El Asri, Layla and Schulz, Hannes and Zumer, Jeremie}, title = {Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation}, journal = {CoRR}, volume = {abs/1706.09799}, year = {2017}, url = {http://arxiv.org/abs/1706.09799} } ``` ## Example ## Running nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt gives Bleu_1: 0.550000 Bleu_2: 0.428174 Bleu_3: 0.284043 Bleu_4: 0.201143 METEOR: 0.295797 ROUGE_L: 0.522104 CIDEr: 1.242192 SkipThoughtsCosineSimilarity: 0.626149 EmbeddingAverageCosineSimilarity: 0.884690 VectorExtremaCosineSimilarity: 0.568696 GreedyMatchingScore: 0.784205 ## Troubleshooting If you have issues with Meteor then you can try lowering the `mem` variable in meteor.py ## Important Note ## CIDEr by default (with idf parameter set to "corpus" mode) computes IDF values using the reference sentences provided. Thus, CIDEr score for a reference dataset with only 1 image (or example for NLG) will be zero. When evaluating using one (or few) images, set idf to "coco-val-df" instead, which uses IDF from the MSCOCO Vaildation Dataset for reliable results. This has not been adapted in this code. For this use-case, apply patches from [vrama91/coco-caption](https://github.com/vrama91/coco-caption). ## External data directory To mount an already prepared data directory to a Docker container or share it between users, you can set the `NLGEVAL_DATA` environment variable to let nlg-eval know where to find its models and data. E.g. NLGEVAL_DATA=~/workspace/nlg-eval/nlgeval/data This variable overrides the value provided during setup (stored in `~/.config/nlgeval/rc.json`) ## Microsoft Open Source Code of Conduct ## This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## License ## See [LICENSE.md](LICENSE.md).