Tacotron2是由Google Brain 2017年提出来的一个语音合成框架。 论文来源:https://arxiv.org/pdf/1712.05884.pdf 项目地址:https://github.com/NVIDIA/tacotron2 WaveGlow发表于 ICASSP 2019会议。 论文来源:https://arxiv.org/pdf/1811.00002v1.pdf https://github.com/NVIDIA/waveglow
Models | Framework | Supported MLU | Supported Data Precision | Multi-GPUs |
---|---|---|---|---|
Tacotron2 | PyTorch1.6 | MLU370-X8 | FP32 | Yes |
WaveGlow | PyTorch1.6 | MLU370-X8 | AMP/FP32 | Yes |
Models | Framework | Supported MLU | Supported Data Precision | Supported Infer Mode |
---|---|---|---|---|
Tacotron2 | PyTorch1.6 | MLU370-S4/X4 | FP32 | CNNL |
WaveGlow | PyTorch1.6 | MLU370-S4/X4 | FP32 | CNNL |
模型默认优化器为Adam,以下为相关参数:
--learning-rate
- 学习率 (Tacotron 2: 1e-3, WaveGlow: 1e-4)--batch-size
- 批量大小 (Tacotron 2 FP16/FP32: 104/48, WaveGlow FP16/FP32: 10/4)--Weight decay
- 权值衰减(Tacotron 2: 1e-6, WaveGlow: 0)--grad_clip_thresh
- (Tacotron 2: 1.0, WaveGlow: 3.4028234663852886e+38)--grad_clip
- (Tacotron 2: 5.0, WaveGlow: 0)--epochs
- (Tacotron 2: 1501, WaveGlow: 1001)--pyamp
- 使用混合精度训练--sampling-rate
- 输入和输出音频的采样率(22050)--filter-length
- (1024)--hop-length
- FFT的跳跃长度,即连续FFT之间的样本步幅 (256)--win-length
- FFT的窗口大小 (1024)--mel-fmin
- 以赫兹为单位的最低频率 (0.0)--mel-fmax
- 以赫兹为单位的最低频率 (8.000)--anneal-steps
- 退火学习率的时期 (500 1000 1500)--anneal-factor
- 退火学习率的因子(FP16/FP32: 0.3/0.1)--segment-length
- 神经网络处理的输入音频的段长度 (8000)--wn-channels
- 耦合层网络中的剩余通道数 (512)1.下载数据集https://keithito.com/LJ-Speech-Dataset/ ,并解压,包含的内容如下: 目录结构:
├── mels
│ ├── LJ001-0001.pt
│ ├── LJ001-0002.pt
│ ├── ...
├── wavs
│ ├── LJ001-0001.wav
│ ├── LJ001-0002.wav
│ ├── ...
├── metadata.csv
├── README
##下载Cambricon PyTorch docker镜像
docker load -i xxx.tar.gz
export IMAGE_NAME=YOUR_IMAGE_NAME
bash run_docker.sh [CONTAINER_NAME]
source /torch/venv3/pytorch/bin/activate
pip install -r ./models/requirements.txt
export IMAGE_NAME=demo_tacotron2
docker build --network=host -t $IMAGE_NAME -f DOCKERFILE ../../../
docker run -it --ipc=host -v /data:/data --device /dev/cambricon_ctl --privileged --name tacotron2 --network=host $IMAGE_NAME
source env.sh
source /torch/venv3/pytorch/bin/activate
bash run_scripts/Tacotron2/Tacotron2_FP32_1501E_4MLUs_Train.sh bash run_scripts/WaveGlow/Waveglow_AMP_1001E_4MLUs_Train.sh
Models | Framework | MLU | Data Precision | Cards | Run |
---|---|---|---|---|---|
Tacotron2 | PyTorch | MLU370-X8 | FP32 | 4 | bash run_scripts/ |
Tacotron2_FP32_1501E_4MLUs_Train.sh |
Models | Framework | MLU | Data Precision | Cards | Run |
---|---|---|---|---|---|
WaveGlow | PyTorch | MLU370-X8 | FP32 | 4 | bash run_scripts/ |
Waveglow_AMP_1001E_1MLUs_Train.sh | |||||
WaveGlow | PyTorch | MLU370-X8 | AMP | 4 | bash run_scripts/ |
Waveglow_AMP_1001E_4MLUs_Train.sh |
执行推理脚本前需自行训练得到模型权重
Models | Framework | MLU | Data Precision | Description | Run |
---|---|---|---|---|---|
Tacotron2 | PyTorch | MLU370-S4 | FP32 | inference script | bash run_scripts/ |
Tacotron2_Infer.sh |
| Waveglow | PyTorch | MLU370-S4 | FP32 | inference script | bash run_scripts/ Waveglow_Infer.sh |
命令行选项运行 训练所有可选参数如下: python train.py -h
usage: train.py [-h] -o OUTPUT [-d DATASET_PATH] -m MODEL_NAME [--log-file LOG_FILE] [--anneal-steps [ANNEAL_STEPS [ANNEAL_STEPS ...]]] [--anneal-factor {0.1,0.3}] [--config-file CONFIG_FILE] [--use-mlu] [--seed SEED] --epochs EPOCHS [--iter ITER] [--epochs-per-checkpoint EPOCHS_PER_CHECKPOINT] [--checkpoint-path CHECKPOINT_PATH] [--resume-multi-device] [--resume-from-last] [--dynamic-loss-scaling DYNAMIC_LOSS_SCALING] [--pyamp] [--cudnn-enabled] [--cudnn-deterministic] [--disable-uniform-initialize-bn-weight] [--use-saved-learning-rate USE_SAVED_LEARNING_RATE] -lr LEARNING_RATE [--weight-decay WEIGHT_DECAY] [--grad-clip-thresh GRAD_CLIP_THRESH] -bs BATCH_SIZE [--grad-clip GRAD_CLIP] [--load-mel-from-disk] [--training-files TRAINING_FILES] [--validation-files VALIDATION_FILES] [--text-cleaners [TEXT_CLEANERS [TEXT_CLEANERS ...]]] [--max-wav-value MAX_WAV_VALUE] [--sampling-rate SAMPLING_RATE] [--filter-length FILTER_LENGTH] [--hop-length HOP_LENGTH] [--win-length WIN_LENGTH] [--mel-fmin MEL_FMIN] [--mel-fmax MEL_FMAX] [--rank RANK] [--world-size WORLD_SIZE] [--dist-url DIST_URL] [--group-name GROUP_NAME] [--dist-backend {nccl,cncl}] [--bench-class BENCH_CLASS]
PyTorch Tacotron 2 Training
optional arguments: -h, --help show this help message and exit -o OUTPUT, --output OUTPUT Directory to save checkpoints -d DATASET_PATH, --dataset-path DATASET_PATH Path to dataset -m MODEL_NAME, --model-name MODEL_NAME Model to train --log-file LOG_FILE Filename for logging --anneal-steps [ANNEAL_STEPS [ANNEAL_STEPS ...]] Epochs after which decrease learning rate --anneal-factor {0.1,0.3} Factor for annealing learning rate --config-file CONFIG_FILE Path to configuration file --use-mlu Enable MLU --seed SEED manually set random seed for torch
training setup: --epochs EPOCHS Number of total epochs to run --iter ITER Number of total epochs to run --epochs-per-checkpoint EPOCHS_PER_CHECKPOINT Number of epochs per checkpoint --checkpoint-path CHECKPOINT_PATH Checkpoint path to resume training --resume-multi-device Resumes training from the last multidevice checkpoint. --resume-from-last Resumes training from the last checkpoint; uses the directory provided with '--output' option to search for the checkpoint "checkpoint_<model_name>_last.pt" --dynamic-loss-scaling DYNAMIC_LOSS_SCALING Enable dynamic loss scaling --pyamp Enable PYAMP --cudnn-enabled Enable cudnn --cudnn-deterministic Run cudnn deterministic --disable-uniform-initialize-bn-weight disable uniform initialization of batchnorm layer weight
optimization setup: --use-saved-learning-rate USE_SAVED_LEARNING_RATE -lr LEARNING_RATE, --learning-rate LEARNING_RATE Learing rate --weight-decay WEIGHT_DECAY Weight decay --grad-clip-thresh GRAD_CLIP_THRESH Clip threshold for gradients -bs BATCH_SIZE, --batch-size BATCH_SIZE Batch size per GPU --grad-clip GRAD_CLIP Enables gradient clipping and sets maximum gradient norm value
dataset parameters: --load-mel-from-disk Loads mel spectrograms from disk instead of computing them on the fly --training-files TRAINING_FILES Path to training filelist --validation-files VALIDATION_FILES Path to validation filelist --text-cleaners [TEXT_CLEANERS [TEXT_CLEANERS ...]] Type of text cleaners for input text
audio parameters: --max-wav-value MAX_WAV_VALUE Maximum audiowave value --sampling-rate SAMPLING_RATE Sampling rate --filter-length FILTER_LENGTH Filter length --hop-length HOP_LENGTH Hop (stride) length --win-length WIN_LENGTH Window length --mel-fmin MEL_FMIN Minimum mel frequency --mel-fmax MEL_FMAX Maximum mel frequency
distributed setup: --rank RANK Rank of the process, do not set! Done by multiproc module --world-size WORLD_SIZE Number of processes, do not set! Done by multiproc module --dist-url DIST_URL Url used to set up distributed training --group-name GROUP_NAME Distributed group name --dist-backend {nccl,cncl} Distributed run backend
benchmark: --bench-class BENCH_CLASS
** 推理所有可选参数如下:** python inference.py -h usage: inference.py [-h] -i INPUT -o OUTPUT [--suffix SUFFIX] [--tacotron2 TACOTRON2] [--waveglow WAVEGLOW] [-s SIGMA_INFER] [-d DENOISING_STRENGTH] [-sr SAMPLING_RATE] [--fp16] [--device DEVICE] [--log-file LOG_FILE] [--include-warmup] [--stft-hop-length STFT_HOP_LENGTH]
PyTorch Tacotron 2 Inference
optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT full path to the input text (phareses separated by new line) -o OUTPUT, --output OUTPUT output folder to save audio (file per phrase) --suffix SUFFIX output filename suffix --tacotron2 TACOTRON2 full path to the Tacotron2 model checkpoint file --waveglow WAVEGLOW full path to the WaveGlow model checkpoint file -s SIGMA_INFER, --sigma-infer SIGMA_INFER -d DENOISING_STRENGTH, --denoising-strength DENOISING_STRENGTH -sr SAMPLING_RATE, --sampling-rate SAMPLING_RATE Sampling rate --fp16 Run inference with mixed precision --device DEVICE Run inference on GPU,MLU or CPU --log-file LOG_FILE Filename for logging --include-warmup Include warmup --stft-hop-length STFT_HOP_LENGTH STFT hop length for estimating audio length from mel size
训练结果 Training accuracy results: MLU370-X8
** 推理结果**
Models | MLUs | val_loss |
---|---|---|
Tacotron2 | 4 | 2.6804521083831787 |
WaveGlow | 4 | -2.939938988004412 |
您明确了解并同意,以下链接中的软件、数据或者模型由第三方提供并负责维护。在以下链接中出现的任何第三方的名称、商标、标识、产品或服务并不构成明示或暗示与该第三方或其软件、数据或模型的相关背书、担保或推荐行为。您进一步了解并同意,使用任何第三方软件、数据或者模型,包括您提供的任何信息或个人数据(不论是有意或无意地),应受相关使用条款、许可协议、隐私政策或其他此类协议的约束。因此,使用链接中的软件、数据或者模型可能导致的所有风险将由您自行承担。 公开的数据集地址:https://keithito.com/LJ-Speech-Dataset/
@TODO
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。