# pit-speech-separation

**Repository Path**: cmy_program/pit-speech-separation

## Basic Information

- **Project Name**: pit-speech-separation
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-10
- **Last Updated**: 2021-12-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

This project is used for PIT training of two speakers.

We use Tensorflow(1.0) LSTM(BLSTM) to do PIT.

Reference:

  Kolbæk, M., Yu, D., Tan, Z.-H., & Jensen, J. (2017). Multi-talker Speech Separation and Tracing with Permutation Invariant Training of Deep Recurrent Neural Networks, 1–10. Retrieved from http://arxiv.org/abs/1703.06284

# How to prepare data
## Generate mixed speech and coresponding targets speech file.

If you have WSJ0 data, you can use this code http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip to create the mixed speech. 

Or you can also use you own data.

## Extract FFT spectrum feats for every utterance. 

For every utterance, you need to extract the mixed speech, speak1 and speaker2 feature matrix and use the function in 'io_funcs/tfrecords_io.py'  make_sequence_example_two_labels(inputs,inputs_cmvn, labels1, labels2)  to generate tensorflow examples. 
     
     inputs: the mixed speech feats matrix with shape (num_frames, dim)
     
     inputs_cmvn: the mixed speech feats matrix after mean and variance normalization. I don't think this is necessary. You can 
                  use the same data with inputs.
     
     labels, labels2: spker1 and spker2's feats as targets.
     
     
## Generate tfrecord files list for training, cv and test sets.

make a dir, named lists. Put all the training tfrecord files' path to 'lists/tr.lst' and the same for the 'lists/cv.lst', 'lists/tt.lst'

## Run run.sh

Once you prapared all  data list files for tr, cv and tt (test), you can run 'run.sh' from the step3--train RNN. Make sure you give the right list dir.