# ASR_Syllable **Repository Path**: lzh272915893/ASR_Syllable ## Basic Information - **Project Name**: ASR_Syllable - **Description**: 基于卷积神经网络的语音识别声学模型的研究 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 1 - **Created**: 2020-10-06 - **Last Updated**: 2023-12-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ASR_Syllable =======================基于卷积神经网络的语音识别声学模型的研究========================
## 此项目是对自己研一与研二上之间对于DCNN-CTC学习总结，提出了MCNN-CTC以及Densenet-CTC声学模型，最终实验结果如下所示：
#### 1) Thchs30_TrainingResults
![Thchs30训练以及微调训练曲线](https://github.com/zw76859420/ASR_Syllable/blob/master/training_results/Thchs_Training_Loss.png) #### 2) Thchs30_Results
![Thchs30实验结果](https://github.com/zw76859420/ASR_Syllable/blob/master/training_results/Thchs_Results.png) #### 3) Stcmds_Results
![Stcmds实验结果](https://github.com/zw76859420/ASR_Syllable/blob/master/training_results/STCMDS_Results.png) ## 声学模型介绍
#### 1) DCNN-CTC声学模型介绍
该模型主要是在speech_model-05上进行修改，上述模型主要使用DCNN-CTC构建语音识别声学模型，STcmds 数据集也是仿照该模型进行修改，最后实验结果如上图所示；
#### 2) MCNN-CTC声学模型介绍
该模型主要是在speech_model_10 脚本上进行实验，最终实验结果可在上图2）所示结果，最终MCNN-CTC总体实验结果相较于DCNN-CTC较好；
#### 3) DenseNet-CTC声学模型介绍
上述模型主要是在 DenseNet上进行实验，最终实验在Thchs30数据集结果可以达到接近30%左右的CER，具体实验可以自己付尝试一下;
#### 4) Attention-CTC声学模型
此模型主要在DCNN-CTC基础上，在全连接层进行注意力操作，最终结果相较于其他结果相较于DCNN-CTC可能有提升，具体可以参看speech_model_06脚本；主要算法实验如下所示：
NN(Attention)-CTC:
# dense1 = Dense(units=512, activation='relu', use_bias=True, kernel_initializer='he_normal')(reshape)
# attention_prob = Dense(units=512, activation='softmax', name='attention_vec')(dense1)
# attention_mul = multiply([dense1, attention_prob])
#
# dense1 = BatchNormalization(epsilon=0.0002)(attention_mul)
# dense1 = Dropout(0.3)(dense1)
## 迁移学习
Retraining(重新训练)主要对初始模型进行进一步微调，可进一步提升初始模型的准确率，具体训练脚本可参看 train_modelSpeech 脚本，本文主要针对全部网路层进行微调，实验结果相较于初始模型可进一步提升，具体实验结果可参看图1)
## 论文引用
W Zhang, M H Zhai, Z L Huang, et al. Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks[C]. https://doi.org/10.1007/978-3-030-27529-7_29
## 参考项目连接
[个人博客](https://blog.csdn.net/Xwei1226 "悬停显示") 包含自己近期的学习总结
[参考链接](https://github.com/nl8590687/ASRT_SpeechRecognition "悬停显示")
[ASR_WORD](https://github.com/zw76859420/ASR_WORD "悬停显示")以字为建模单元构建语音识别声学模型