ascend_community_projects: 推理边缘开发套件社区代码仓库

本样例为大家学习昇腾软件栈提供参考，非商业目的！

基于WeNet的自动语音识别

功能：使用Wenet模型实现语音（仅限中文语音）转文字。
样例输入：语音文件（必须是wav格式）。
样例输出：中文文本字符串。

前置条件

本项目在Atlas 200I DK A2开发板中完成测试。请自行购买Atlas 200I DK A2，并根据官方文档，使用昇腾基础镜像，完成一键制卡。本项目的依赖包都在基础镜像的base环境里面，用ssh登录dk后，base环境会自动激活。

样例准备

获取源码包。

可以使用以下两种方式下载，请选择其中一种进行源码准备。

命令行方式下载（下载时间较长，但步骤简单）。

# 开发环境，以root用户命令行中执行以下命令下载源码仓。    
cd ${HOME}     
git clone https://gitee.com/ascend/ascend_community_projects.git
# 切换到310B分支
cd ascend_community_projects
git checkout 310B

压缩包方式下载（下载时间较短，但步骤稍微复杂）。

 # 1. ascend_community_projects仓左上角切换到310B分支，在点击右上角的 【克隆/下载】，选择 【下载ZIP】。    
 # 2. 将ZIP包上传到开发环境中的任意一个目录。     
 # 3. 开发环境中，执行以下命令，解压zip包。     
 cd ${HOME}    
 unzip ascend-samples-master.zip

获取此应用中所需要的原始网络模型并执行模型转换。

cd ${HOME}/ascend_community_projects/SpeechRecognition    
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Atlas%20200I%20DK%20A2/DevKit/samples/23.0.RC1/base-samples/notebook-demo-datasets/10-speech-recognition/offline_encoder_sim.onnx

在开发板上执行atc模型转换命令之前，需要先配置swap分区，或者用PC进行模型转换。

 atc --model=offline_encoder_sim.onnx --framework=5 --output=offline_encoder --input_format=ND --input_shape="speech:1,1478,80;speech_lengths:1" --log=error --soc_version=Ascend310B1
 # 转换后出现以下警告信息，不影响模型推理结果
 # W11001: Op [/encoder/encoders.0/self_attn/Cast] does not hit the high-priority operator information library, which might result in compromised performance.
 # W11001: Op [/Cast] does not hit the high-priority operator information library, which might result in compromised performance.
 # W11001: Op [PartitionedCall_/ReduceSum_ReduceSum_670] does not hit the high-priority operator information library, which might result in compromised performance.

其中转换参数的含义为：

--model：输入模型路径
--framework：原始网络模型框架类型，5表示ONNX
--output：输出模型路径
--input_format：输入Tensor的内存排列方式
--input_shape：指定模型输入数据的shape。这里我们在转模型的时候指定了最大的输入音频长度，推荐的长度有：262,326,390,454,518,582,646,710,774,838,902,966,1028,1284,1478，默认使用的长度是1478
--log：日志级别
--soc_version：昇腾AI处理器型号

获取运行模型需要的配置文件和样例语音文件。

cd ${HOME}/ascend_community_projects/SpeechRecognition
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Atlas%20200I%20DK%20A2/DevKit/samples/23.0.RC1/base-samples/notebooks/10-speech-recognition/vocab.txt
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Atlas%20200I%20DK%20A2/DevKit/samples/23.0.RC1/base-samples/notebooks/10-speech-recognition/sample.wav

样例运行

cd ${HOME}/ascend_community_projects/SpeechRecognition
python main.py

查看结果

运行完成后，语音识别结果会打印在命令行回显中，如下所示：

[INFO] acl init success
[INFO] open device 0 success
[INFO] load model offline_encoder.om success
[INFO] create model description success
2023-06-13 01:52:22.283867: E external/org_tensorflow/tensorflow/core/framework/node_def_util.cc:675] NodeDef mentions attribute input_para_type_list which is not in the op definition: Op<name=Sum; signature=input:T, reduction_indices:Tidx -> output:T; attr=keep_dims:bool,default=false; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, 6034766930529145842, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]; attr=Tidx:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]> This may be expected if your graph generating binary is newer  than this binary. Unknown attributes will be ignored. NodeDef: {{node PartitionedCall_/ReduceSum_ReduceSum_670}}
智能语音作为智能时代人机交互的关键接口各行各业爆发式的场景需求驱动行业发展进入黄金期
[INFO] unload model success, model Id is 1
[INFO] end to destroy context
[INFO] end to reset device is 0
[INFO] end to finalize acl

Ascend/ascend_community_projects

基于WeNet的自动语音识别

前置条件

样例准备

样例运行

查看结果

简介

发行版

贡献者

语言

近期动态

Ascend/ascend_community_projects .gitee-modal { width: 500px !important; }

基于WeNet的自动语音识别

前置条件

样例准备

样例运行

查看结果

简介

发行版

贡献者

语言

近期动态

搜索帮助

Ascend/ascend_community_projects