# speechmetrics **Repository Path**: chenyuz618_admin/speechmetrics ## Basic Information - **Project Name**: speechmetrics - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-25 - **Last Updated**: 2024-02-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # speechmetrics This repository is a wrapper around several freely available implementations of objective metrics for estimating the quality of speech signals. It includes both _relative_ and _absolute_ metrics, which means metrics that do or do not need a reference signal, respectively. If you find speechmetrics useful, you are welcome to cite the original papers for the corresponding metrics, since this is just a wrapper around the implementations that were kindly provided by the original authors. > Please let me know if you think of some metric with available python implementation that could be included here! # Installation As of our recent tests, installation goes smoothly on ubuntu, but there may be some compiler errors for `pypesq` on iOs. For cpu usage: ``` pip install numpy pip install git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics[cpu] ``` For gpu usage (on the MOSNet) ``` pip install numpy pip install git+https://github.com/aliutkus/speechmetrics#egg=speechmetrics[gpu] ``` # Usage `speechmetrics` has been designed to be easily used in a modular way. All you need to do is to specify the actual metrics you want to use and it will load them. The process is to: 1. Load the metrics you want with the `load` function from the root of the package, that takes two arguments: * metrics: str or list of str the available metrics that match this argument will be automatically loaded. This matching is relative to the structure of the speechmetrics package. For instance: - 'absolute' will match all absolute metrics - 'absolute.srmr' or 'srmr' will only match SRMR - '' will match all * window: float or None gives the length in seconds of the windows on which to compute the actual scores. If None, the whole signals will be considered. ```my_metrics = speechmetrics.load('relative', window=5)``` 2. Just call the object returned by `load` with your estimated file (and your reference in case of relative metrics.) ```scores = my_metrics(path_to_estimate, path_to_reference)``` Numpy arrays are also supported, but the corresponding sampling rate needs to be specified ```scores = my_metrics(estimate_array, reference_array, rate=sampling_rate)``` > __WARNING__: The convention for relative metrics is to provide __estimate first, and reference second__. > This is the opposite as the general convention. > => The advantage is: you can still call absolute metrics with the same code, they will just ignore the reference. ## Example ``` # the case of absolute metrics import speechmetrics window_length = 5 # seconds metrics = speechmetrics.load('absolute', window_length) scores = metrics(path_to_audio_file) # the case of relative metrics metrics = speechmetrics.load(['bsseval', 'sisdr'], window_length) scores = metrics(path_to_estimate_file, path_to_reference) # mixed case, still works metrics = speechmetrics.load(['bsseval', 'mosnet'], window_length) scores = metrics(path_to_estimate_file, path_to_reference) ``` # Available metrics ## Absolute metrics (`absolute`) ### MOSNet (`absolute.mosnet` or `mosnet`) *dimensionless, higher is better. 0=very bad, 5=very good* As provided by the authors of [MOSNet: Deep Learning based Objective Assessment for Voice Conversion](https://arxiv.org/abs/1904.08352). Original github [here](https://github.com/lochenchou/MOSNet) > @article{lo2019mosnet, title={MOSNet: Deep Learning based Objective Assessment for Voice Conversion}, author={Lo, Chen-Chou and Fu, Szu-Wei and Huang, Wen-Chin and Wang, Xin and Yamagishi, Junichi and Tsao, Yu and Wang, Hsin-Min}, journal={arXiv preprint arXiv:1904.08352}, year={2019} } ### SRMR (`absolute.srmr` or `srmr`) *dimensionless ratio, higher is better. 0=very bad, 1=very good* As provided by the [SRMR Toolbox](https://github.com/jfsantos/SRMRpy), implemented by [@jfsantos](https://github.com/jfsantos). * > @article{falk2010non, title={A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech}, author={Falk, Tiago H and Zheng, Chenxi and Chan, Wai-Yip}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, volume={18}, number={7}, pages={1766--1774}, year={2010}, } * > @inproceedings{santos2014updated, title={An updated objective intelligibility estimation metric for normal hearing listeners under noise and reverberation}, author={Santos, Joo F and Senoussaoui, Mohammed and Falk, Tiago H}, booktitle={Proc. Int. Workshop Acoust. Signal Enhancement}, pages={55--59}, year={2014} } * > @article{santos2014updating, title={Updating the SRMR-CI metric for improved intelligibility prediction for cochlear implant users}, author={Santos, Jo{\~a}o F and Falk, Tiago H}, journal={IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)}, volume={22}, number={12}, pages={2197--2206}, year={2014}, } ## Relative metrics (`relative`) ### BSSEval (`relative.bsseval` or `bsseval`) *expressed in dB, higher is better.* As presented in [this](https://hal-lirmm.ccsd.cnrs.fr/lirmm-01766791v2/document) paper and freely available in [the official museval page](https://github.com/sigsep/sigsep-mus-eval), corresponds to BSSEval v4. There are 3 submetrics handled here: SDR, SAR, ISR. > @InProceedings{SiSEC18, author="St{\"o}ter, Fabian-Robert and Liutkus, Antoine and Ito, Nobutaka", title="The 2018 Signal Separation Evaluation Campaign", booktitle="Latent Variable Analysis and Signal Separation: 14th International Conference, LVA/ICA 2018, Surrey, UK", year="2018", pages="293--305" } ### PESQ (`relative.pesq` or `pesq`) *dimensionless, higher is better. 0=very bad, 5=very good* Wide band PESQ. As implemented [there](https://github.com/ludlows/python-pesq) by [@ludlows](https://github.com/ludlows). [Pranay Manocha](https://github.com/pranaymanocha): "[This implementation] matches with a very old matlab implementation of Phillip Loizou’s book. (I personally verified that)" ### NBPESQ (`relative.nb_pesq` or `nb_pesq`) *dimensionless, higher is better. 0=very bad, 5=very good* Narrow band PESQ. As implemented [there](https://github.com/vBaiCai/python-pesq) by [@vBaiCai](https://github.com/vBaiCai). ### STOI (`relative.stoi` or `stoi`) *dimensionless correlation coefficient, higher is better. 0=very bad, 1=very good* As implemented by [@mpariente](https://github.com/mpariente) [here](https://github.com/mpariente/pystoi) * > @inproceedings{taal2010short, title={A short-time objective intelligibility measure for time-frequency weighted noisy speech}, author={Taal, Cees H and Hendriks, Richard C and Heusdens, Richard and Jensen, Jesper}, booktitle={2010 IEEE International Conference on Acoustics, Speech and Signal Processing}, pages={4214--4217}, year={2010}, organization={IEEE} } * > @article{taal2011algorithm, title={An algorithm for intelligibility prediction of time--frequency weighted noisy speech}, author={Taal, Cees H and Hendriks, Richard C and Heusdens, Richard and Jensen, Jesper}, journal={IEEE Transactions on Audio, Speech, and Language Processing}, volume={19}, number={7}, pages={2125--2136}, year={2011}, publisher={IEEE} } * > @article{jensen2016algorithm, title={An algorithm for predicting the intelligibility of speech masked by modulated noise maskers}, author={Jensen, Jesper and Taal, Cees H}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={24}, number={11}, pages={2009--2022}, year={2016}, publisher={IEEE} } ### SISDR: Scale-invariant SDR (`relative.sisdr` or `sisdr`) *expressed in dB, higher is better.* As described in the following paper and implemented by [@Jonathan-LeRoux](https://github.com/Jonathan-LeRoux) [here](https://github.com/sigsep/bsseval/issues/3#issuecomment-494995846) * > @article{Roux_2019, title={SDR – Half-baked or Well Done?}, ISBN={9781479981311}, url={http://dx.doi.org/10.1109/ICASSP.2019.8683855}, DOI={10.1109/icassp.2019.8683855}, journal={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, publisher={IEEE}, author={Roux, Jonathan Le and Wisdom, Scott and Erdogan, Hakan and Hershey, John R.}, year={2019}, month={May} }