# Conv-TasNet

**Repository Path**: poe-ge/Conv-TasNet

## Basic Information

- **Project Name**: Conv-TasNet
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-18
- **Last Updated**: 2025-12-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Conv-TasNet

**:bangbang:new:bangbang:: The modified training and testing code is now able to separate speech properly.**

**:bangbang:new:bangbang:: Updated model code, added code for skip connection section.**

**:bangbang:notice:bangbang:: Training Batch size setting 8/16**

**:bangbang:notice:bangbang:: The implementation of another article optimizing Conv-TasNet has been open sourced in ["Deep-Encoder-Decoder-Conv-TasNet"](https://github.com/JusperLee/Deep-Encoder-Decoder-Conv-TasNet).**

Demo Pages: [Results of pure speech separation model](https://www.likai.show/Pure-Audio/index.html)

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
> Luo Y, Mesgarani N. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256-1266.

[![GitHub issues](https://img.shields.io/github/issues/JusperLee/Conv-TasNet)](https://github.com/JusperLee/Conv-TasNet/issues)  [![GitHub forks](https://img.shields.io/github/forks/JusperLee/Conv-TasNet)](https://github.com/JusperLee/Conv-TasNet/network) [![GitHub stars](https://img.shields.io/github/stars/JusperLee/Conv-TasNet)](https://github.com/JusperLee/Conv-TasNet/stargazers) [![Twitter](https://img.shields.io/twitter/url?style=social)](https://twitter.com/intent/tweet?text=Wow:&url=https%3A%2F%2Fgithub.com%2FJusperLee%2FConv-TasNet)

### Requirement
- **Pytorch 1.3.0**
- **TorchAudio 0.3.1**
- **PyYAML 5.1.2**

### Accomplished goal
- [x] **Support Multi-GPU Training, you can see the train.yml**
- [x] **Use the Dataloader Method That Comes With Pytorch**
- [x] **Provide Pre-Training Models**

### Preparation files before training
1. Generate dataset using [create-speaker-mixtures.zip](http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip) with WSJ0 or TIMI
2. Generate scp file using script file of create_scp.py

### Training this model
- If you want to adjust the network parameters and the path of the training file, please modify the **option/train/train.yml** file.
- Training Command
   ```python
  python train.py ./option/train/train.yml
   ```

### Inference this model
- Inference Command (Use this command if you need to test a **large number** of audio files.)
   ```python
  python Separation.py -mix_scp 1.scp -yaml ./config/train/train.yml -model best.pt -gpuid [0,1,2,3,4,5,6,7] -save_path ./checkpoint
   ```
- Inference Command (Use this command if you need to test a **single** audio files.)

   ```python
  python Separation_wav.py -mix_wav 1.wav -yaml ./config/train/train.yml -model best.pt -gpuid [0,1,2,3,4,5,6,7] -save_path ./checkpoint
   ```
### Results
- Currently training, the results will be displayed when the training is over.
- The following table is the experimental results of different parameters in the paper

|  N | L  | B  | H  | Sc  | P  | X  | R  | Normalization  |Causal   |  Receptive field | Model Size|SI-SNRi  |  SDRi | 
| :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: | :------------: |  :------------: |
| 128  | 40  | 128  | 256  |128   | 3  | 7  | 2  | gLN  |  x | 1.28  |  1.5M | 13.0  | 13.3  |
|  256 |  40 |  128 |  256 |128|  3 | 7  |  2 |  gLN | x  |  1.28 | 1.5M  | 13.1  | 13.4  |
|  512 |  40 |  128 |  256 |128|  3 | 7  |  2 | gLN  |  x | 1.28  |  1.7M |  13.3 | 13.6  |
| 512  |  40 | 128  | 256  |256| 3  | 7  |  2 |  gLN | x  | 1.28  |  2.4M | 13.0  |  13.3 |
| 512  |  40 | 128  | 512  |128|  3 | 7  | 2  |   gLN|  x | 1.28  | 3.1M  | 13.3  | 13.6  |
|  512 | 40  |  128 | 512  |512| 3  | 7  | 2  |   gLN|  x |  1.28 | 6.2M  |  13.5 |  13.8 |
|  512 |  40 |  256 | 256  |256| 3  | 7  | 2  |  gLN |x   | 1.28  | 3.2M  | 13.0  | 13.3  |
|  512 |  40 |  256 |512|  256 | 3|  7 | 2  | gLN  |x   | 1.28  | 6.0M  | 13.4  |  13.7 |
| 512  |  40 |  256 |512|  512 |3|   7| 2  |  gLN | x  | 1.28  |  8.1M | 13.2  |  13.5 |
|  512 | 40  | 128  |512| 128  |3| 6  |  4 | gLN  |x   | 1.27  | 5.1M  |14.1   |  14.4 |
| 512  | 40  | 128  |512| 128  |3| 4 |  6 |  gLN |  x |  0.46 |  5.1M | 13.9  |  14.2 |
|512   | 40  | 128  |512|  128 |3|  8 |  3 | gLN  |  x |  3.83 |  5.1M | 14.5  |  14.8 |
|  512 | 32  |128|512|128|   3| 8  |  3 |  gLN |x  | 3.06  |  5.1M | 14.7  |  15.0 |
| 512  |  16 |128| 512  |128| 3  |   8|  3 | gLN  |  x | 1.53  | 5.1M  |**15.3**  | **15.6**  |
| 512  |  16 |128| 512  |128|  3 |  8 |  3 |cLN   |  √ |  1.53 | 5.1M  | 10.6  |  11.0 |

### Pre-Train Model
:bangbang:new:bangbang:: [Huggingface Pretrain](https://huggingface.co/JusperLee/Conv-TasNet/tree/main)
[**Google Driver**](https://drive.google.com/open?id=18xCr-N_Ashf9X9q0nxQSVZbDXDk2ONVQ)


### Our Results Image
![](https://github.com/JusperLee/Conv-TasNet/blob/master/Conv_TasNet_Pytorch/conv_tasnet_loss.png)


### Reference

- [Luo Yi's Conv-Tasnet Code](https://github.com/naplab/Conv-TasNet)