# DP-RTF-Learning **Repository Path**: halapano/DP-RTF-Learning ## Basic Information - **Project Name**: DP-RTF-Learning - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-02 - **Last Updated**: 2025-04-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DP-RTF-Learning A python implementation of “**Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization**”, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021. + **Contributions** - A DP-RTF learning framework that embeds the sensor signals to a low-dimensional localization feature space is designed, which disentangles the localization cues from other factors including source signals, noise, reverberation, etc. - **a Novel DP-RTF Learning Network** - **leveraging Monaural Speech Enhancement to Improve the Robustness of DP-RTF Estimation** - **generalization to Unseen Binaural Configurations**
- The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment.
## Datasets + **Head-related impulse responses (HRIRs)**: from CIPIC database + **Binaural room impulse responses (BRIRs)**: generated by Roomsim toolbox + **TIMIT dataset** + **Diffuse noise**: generated by arbitrary noise field generator with noise signals from NOISEX-92 database ## Quick start + **Preparation** - Add soft link of "common" file to "DPRTF" file ``` ln -s [original path] [target path] ``` - Generate the lists of source signals and BRIRs, direct-path relative tranfer functions (DP-RTFs), room acoustic settings, and sensor signals for training, validation and test stages. ``` python -m common.getData --stage [*] --data [*] ``` + **Training** ``` python run.py --gpu-id [*] ``` + **Test** ``` python run.py --gpu-id [*] --test ``` + **Pretrained models** - exp/00000000/model_12.pth: trained with fixed data - exp/00000001/model_52.pth: trained with random data (generated on-the-fly) ## Citation If you find our work useful in your research, please consider citing: ``` @article{yang2021dprtf, Author = "Bing Yang and Hong Liu and Xiaofei Li", Title = "Learning deep direct-path relative transfer function for binaural sound source localization", Journal = "{IEEE/ACM} Transactions on Audio, Speech, and Language Processing (TASLP)", Volume = {29}, Pages = {3491-3503}, Year = {2021}} ``` ``` @InProceedings{yang2021dprtf1, author = "Bing Yang and Xiaofei Li and Hong Liu", title = "Supervised direct-path relative transfer function learning for binaural sound source localization", booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)", year = "2021", pages = "825-829"} ``` ## Licence MIT