# final_project **Repository Path**: NFUNM101/final_project ## Basic Information - **Project Name**: final_project - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-12-23 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 粤难粤说APP [40x40秒的带narration 语音口白的Powerpoint](https://gitee.com/NFUNM101/final_project/blob/master/%E7%B2%A4%E9%9A%BE%E7%B2%A4%E8%AF%B4APP.pptx) 发布日期|2019-12-23 --|:--:| 项目名称|粤难粤说 项目状态|进行中 项目主人|张美玲 项目的开发者|张美玲 项目的测试者|张美玲 ## 一、价值主张设计 #### 背景概述 广东是个神奇的地方,它有着好听的地方方言——粤语,有着两个一线城市——深圳和广州,有着非常会做生意的广东人,有着好吃的广东美食。这些的种种吸引着许多人来广东定居或者游玩,但对于这些人来说,广东又是个难懂的地方,问题来了,他们听不懂粤语。如今市面上的粤语app主要用于学习,但日常交流的“翻译器”也是人们所需的。 #### 加值宣言 该产品是一款交流与学习的功能相结合的产品,主要使用了语音识别技术,通过语音识别技术,令用户既能在日常中交流无阻,又能令用户在交流的过程中学习粤语。 - 交流无阻 用户因工作原因或游玩来到广东,碰到听不懂粤语的情况,使用语音输入,将粤语输入APP中,APP智能反馈文字或语音翻译,做到不用学会粤语也能交流无阻。 - 学习粤语 用户在广东长期定居,需要学习粤语不再想依赖翻译器,通过APP随时随地学粤语。 #### 核心价值(最小可用产品)--痛点 不会粤语的非广东人遇到只会说粤语的广东人时,出现交流障碍,通过APP进行即时翻译,打破方言阻碍。 #### 人工智能概率性 在语音输入时,反馈的信息内容会受当下的语音环境影响,如语音声音太小,环境太嘈杂,用户咬字不清等情况,但APP可根据句意自动纠错、自动断句添加标点等,当输入的语音信息不清楚时,将会智能识别并给出正确的翻译。 #### 需求列表 功能|用户使用场景|重要程度 --|:--:|-- 翻译|当不会粤语的外地人遇到老一辈的广东人,普通话讲不标准,为了能流畅交流,让这个老一辈的广东人讲粤语,通过语音输入,反馈翻译的文字或语音,双方都不影响交流|重要 学习粤语|用户刚开始学习粤语,想与广东人用粤语进行交流,又怕讲不标准,可将APP当作学习工具,语音输入一句想讲的粤语,APP智能修改病句及纠正发音,不用害怕讲不标准闹笑话|次重要 ## 二、原型设计 [产品原型——点击此处进入](http://nfunm101.gitee.io/yue_produce) ### 界面/交互设计 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/003548_7b7fcaa7_1648162.png "翻译.PNG") ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/003624_e7cb577a_1648162.png "学习.PNG")![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/003641_bea220ad_1648162.png "上传-粤.PNG") ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/003921_96bb7fb2_1648162.png "TVB剧.PNG") ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/003942_57476318_1648162.png "我的.PNG") ### 信息设计 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/005712_6eb5253b_1648162.png "信息输出.PNG") ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/005726_d55e4058_1648162.png "信息学习.PNG") ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/005740_8c27c150_1648162.png "信息我的.PNG") ### 产品架构图 #### 产品架构图 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/012406_3993a1a2_1648162.png "粤产品架构图.PNG") #### 产品功能结构图 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/012600_040dbbf8_1648162.png "粤产品功能架构图.PNG") #### 产品信息架构图 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/012449_5a585fad_1648162.png "粤产品信息架构图.PNG") #### 产品流程图 ![输入图片说明](https://images.gitee.com/uploads/images/2020/0107/012504_cfbe31c9_1648162.png "粤流程图.PNG") ## 三、API展示 #### API使用与输出 1、百度语音识别api - 接口描述:该请求用于语音识别。即输入语音,输出该语音的文字或语音(粤语/普通话)信息。 - 接口地址:http://vop.baidu.com/server_api - 输入 ``` import sys import json import base64 import time IS_PY3 = sys.version_info.major == 3 if IS_PY3: from urllib.request import urlopen from urllib.request import Request from urllib.error import URLError from urllib.parse import urlencode timer = time.perf_counter else: from urllib2 import urlopen from urllib2 import Request from urllib2 import URLError from urllib import urlencode if sys.platform == "win32": timer = time.clock else: # On most other platforms the best timer is time.time() timer = time.time API_KEY = 'IEWxeETG7ZP5gkxfXM4FVdbo' SECRET_KEY = 'eMwmXjPsYZUDuyzITFu0BQbXypBwrabZ' # 需要识别的文件 AUDIO_FILE = './luyin.m4a' # 只支持 pcm/wav/amr 格式,极速版额外支持m4a 格式 # 文件格式 FORMAT = AUDIO_FILE[-3:] # 文件后缀只支持 pcm/wav/amr 格式,极速版额外支持m4a 格式 CUID = '123456PYTHON' # 采样率 RATE = 16000 # 固定值 # 普通版 DEV_PID = 1637 # 1537 表示识别普通话,使用输入法模型。1536表示识别普通话,使用搜索模型。根据文档填写PID,选择语言及识别模型 ASR_URL = 'http://vop.baidu.com/server_api' SCOPE = 'audio_voice_assistant_get' # 有此scope表示有asr能力,没有请在网页里勾选,非常旧的应用可能没有 #测试自训练平台需要打开以下信息, 自训练平台模型上线后,您会看见 第二步:“”获取专属模型参数pid:8001,modelid:1234”,按照这个信息获取 dev_pid=8001,lm_id=1234 # DEV_PID = 8001 ; # LM_ID = 1234 ; # 极速版 打开注释的话请填写自己申请的appkey appSecret ,并在网页中开通极速版(开通后可能会收费) # DEV_PID = 80001 # ASR_URL = 'http://vop.baidu.com/pro_api' # SCOPE = 'brain_enhanced_asr' # 有此scope表示有极速版能力,没有请在网页里开通极速版 # 忽略scope检查,非常旧的应用可能没有 # SCOPE = False class DemoError(Exception): pass """ TOKEN start """ TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token' def fetch_token(): params = {'grant_type': 'client_credentials', 'client_id': API_KEY, 'client_secret': SECRET_KEY} post_data = urlencode(params) if (IS_PY3): post_data = post_data.encode( 'utf-8') req = Request(TOKEN_URL, post_data) try: f = urlopen(req) result_str = f.read() except URLError as err: print('token http response http code : ' + str(err.code)) result_str = err.read() if (IS_PY3): result_str = result_str.decode() print(result_str) result = json.loads(result_str) print(result) if ('access_token' in result.keys() and 'scope' in result.keys()): print(SCOPE) if SCOPE and (not SCOPE in result['scope'].split(' ')): # SCOPE = False 忽略检查 raise DemoError('scope is not correct') print('SUCCESS WITH TOKEN: %s EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in'])) return result['access_token'] else: raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response') """ TOKEN end """ if __name__ == '__main__': token = fetch_token() speech_data = [] with open(AUDIO_FILE, 'rb') as speech_file: speech_data = speech_file.read() length = len(speech_data) if length == 0: raise DemoError('file %s length read 0 bytes' % AUDIO_FILE) speech = base64.b64encode(speech_data) if (IS_PY3): speech = str(speech, 'utf-8') params = {'dev_pid': DEV_PID, #"lm_id" : LM_ID, #测试自训练平台开启此项 'format': FORMAT, 'rate': RATE, 'token': token, 'cuid': CUID, 'channel': 1, 'speech': speech, 'len': length } post_data = json.dumps(params, sort_keys=False) # print post_data req = Request(ASR_URL, post_data.encode('utf-8')) req.add_header('Content-Type', 'application/json') try: begin = timer() f = urlopen(req) result_str = f.read() print ("Request time cost %f" % (timer() - begin)) except URLError as err: print('asr http response http code : ' + str(err.code)) result_str = err.read() if (IS_PY3): result_str = str(result_str, 'utf-8') print(result_str) with open("result.txt","w") as of: of.write(result_str) ``` - 输出 ``` {"access_token":"24.5dabb257c9097c30aea2caa9748ab63f.2592000.1579784749.282335-17537195","session_key":"9mzdDtbSM2ybtYJW4kdB3g3KbSXHoKZTBJXfNuiIWacLfX50YPfhW1PTIct+xNJsPGk+LFv+PpUCQWEx3i08Yn0+uf8boA==","scope":"audio_voice_assistant_get brain_enhanced_asr audio_tts_post public brain_all_scope picchain_test_picchain_api_scope wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test\u6743\u9650 vis-classify_flower lpq_\u5f00\u653e cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base iop_autocar oauth_tp_app smartapp_smart_game_openapi oauth_sessionkey smartapp_swanid_verify smartapp_opensource_openapi smartapp_opensource_recapi fake_face_detect_\u5f00\u653eScope vis-ocr_\u865a\u62df\u4eba\u7269\u52a9\u7406 idl-video_\u865a\u62df\u4eba\u7269\u52a9\u7406","refresh_token":"25.f757c46128f7f3ec8093d00758a372e0.315360000.1892552749.282335-17537195","session_secret":"6f94bf1a5a5f77a85a398f1793e37803","expires_in":2592000} {'access_token': '24.5dabb257c9097c30aea2caa9748ab63f.2592000.1579784749.282335-17537195', 'session_key': '9mzdDtbSM2ybtYJW4kdB3g3KbSXHoKZTBJXfNuiIWacLfX50YPfhW1PTIct+xNJsPGk+LFv+PpUCQWEx3i08Yn0+uf8boA==', 'scope': 'audio_voice_assistant_get brain_enhanced_asr audio_tts_post public brain_all_scope picchain_test_picchain_api_scope wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base iop_autocar oauth_tp_app smartapp_smart_game_openapi oauth_sessionkey smartapp_swanid_verify smartapp_opensource_openapi smartapp_opensource_recapi fake_face_detect_开放Scope vis-ocr_虚拟人物助理 idl-video_虚拟人物助理', 'refresh_token': '25.f757c46128f7f3ec8093d00758a372e0.315360000.1892552749.282335-17537195', 'session_secret': '6f94bf1a5a5f77a85a398f1793e37803', 'expires_in': 2592000} audio_voice_assistant_get SUCCESS WITH TOKEN: 24.5dabb257c9097c30aea2caa9748ab63f.2592000.1579784749.282335-17537195 EXPIRES IN SECONDS: 2592000 Request time cost 0.114846 {"err_msg":"audio trans failed","err_no":3316,"sn":"558407106251577192749"} ``` 2、语音合成 - 接口描述:该请求用于语音合成。即上传文本文件或输入文字,输出该文本的文字或语音(粤语/普通话)信息。 - 接口地址:http://tsn.baidu.com/text2audio - 输入 ``` import sys import json IS_PY3 = sys.version_info.major == 3 if IS_PY3: from urllib.request import urlopen from urllib.request import Request from urllib.error import URLError from urllib.parse import urlencode from urllib.parse import quote_plus else: import urllib2 from urllib import quote_plus from urllib2 import urlopen from urllib2 import Request from urllib2 import URLError from urllib import urlencode APP_ID = '17537195' API_KEY = 'IEWxeETG7ZP5gkxfXM4FVdbo' TEXT = "作业好多我好开心" # 发音人选择, 基础音库:0为度小美,1为度小宇,3为度逍遥,4为度丫丫, # 精品音库:5为度小娇,103为度米朵,106为度博文,110为度小童,111为度小萌,默认为度小美 PER = 4 # 语速,取值0-15,默认为5中语速 SPD = 5 # 音调,取值0-15,默认为5中语调 PIT = 5 # 音量,取值0-9,默认为5中音量 VOL = 5 # 下载的文件格式, 3:mp3(default) 4: pcm-16k 5: pcm-8k 6. wav AUE = 3 FORMATS = {3: "mp3", 4: "pcm", 5: "pcm", 6: "wav"} FORMAT = FORMATS[AUE] CUID = "123456PYTHON" TTS_URL = 'http://tsn.baidu.com/text2audio' class DemoError(Exception): pass """ TOKEN start """ TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token' SCOPE = 'audio_tts_post' # 有此scope表示有tts能力,没有请在网页里勾选 def fetch_token(): print("fetch token begin") params = {'grant_type': 'client_credentials', 'client_id': API_KEY, 'client_secret': SECRET_KEY} post_data = urlencode(params) if (IS_PY3): post_data = post_data.encode('utf-8') req = Request(TOKEN_URL, post_data) try: f = urlopen(req, timeout=5) result_str = f.read() except URLError as err: print('token http response http code : ' + str(err.code)) result_str = err.read() if (IS_PY3): result_str = result_str.decode() print(result_str) result = json.loads(result_str) print(result) if ('access_token' in result.keys() and 'scope' in result.keys()): if not SCOPE in result['scope'].split(' '): raise DemoError('scope is not correct') print('SUCCESS WITH TOKEN: %s ; EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in'])) return result['access_token'] else: raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response') """ TOKEN end """ if __name__ == '__main__': token = fetch_token() tex = quote_plus(TEXT) # 此处TEXT需要两次urlencode print(tex) params = {'tok': token, 'tex': tex, 'per': PER, 'spd': SPD, 'pit': PIT, 'vol': VOL, 'aue': AUE, 'cuid': CUID, 'lan': 'zh', 'ctp': 1} # lan ctp 固定参数 data = urlencode(params) print('test on Web Browser' + TTS_URL + '?' + data) req = Request(TTS_URL, data.encode('utf-8')) has_error = False try: f = urlopen(req) result_str = f.read() headers = dict((name.lower(), value) for name, value in f.headers.items()) has_error = ('content-type' not in headers.keys() or headers['content-type'].find('audio/') < 0) except URLError as err: print('asr http response http code : ' + str(err.code)) result_str = err.read() has_error = True save_file = "error.txt" if has_error else 'result.' + FORMAT with open(save_file, 'wb') as of: of.write(result_str) if has_error: if (IS_PY3): result_str = str(result_str, 'utf-8') print("tts api error:" + result_str) print("result saved as :" + save_file) ``` - 输出 ``` fetch token begin {"access_token":"24.e64058ba1ed4ead39d6f245d5d593457.2592000.1579784867.282335-17537195","session_key":"9mzdXvNzuOQwnUZMt42nMIxHa3K7+L8OGSNdaiI6852guxyBRlbAfUIUslatsnTrCdM1tKAVeDKlu3E6PzA+CQDJ1uscCA==","scope":"audio_voice_assistant_get brain_enhanced_asr audio_tts_post public brain_all_scope picchain_test_picchain_api_scope wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test\u6743\u9650 vis-classify_flower lpq_\u5f00\u653e cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base iop_autocar oauth_tp_app smartapp_smart_game_openapi oauth_sessionkey smartapp_swanid_verify smartapp_opensource_openapi smartapp_opensource_recapi fake_face_detect_\u5f00\u653eScope vis-ocr_\u865a\u62df\u4eba\u7269\u52a9\u7406 idl-video_\u865a\u62df\u4eba\u7269\u52a9\u7406","refresh_token":"25.04aa8a8bed93e026c3f4444a4934f9d0.315360000.1892552867.282335-17537195","session_secret":"46628ed024217dbbb9b1ce9d5aadf84e","expires_in":2592000} {'access_token': '24.e64058ba1ed4ead39d6f245d5d593457.2592000.1579784867.282335-17537195', 'session_key': '9mzdXvNzuOQwnUZMt42nMIxHa3K7+L8OGSNdaiI6852guxyBRlbAfUIUslatsnTrCdM1tKAVeDKlu3E6PzA+CQDJ1uscCA==', 'scope': 'audio_voice_assistant_get brain_enhanced_asr audio_tts_post public brain_all_scope picchain_test_picchain_api_scope wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian ApsMisTest_Test权限 vis-classify_flower lpq_开放 cop_helloScope ApsMis_fangdi_permission smartapp_snsapi_base iop_autocar oauth_tp_app smartapp_smart_game_openapi oauth_sessionkey smartapp_swanid_verify smartapp_opensource_openapi smartapp_opensource_recapi fake_face_detect_开放Scope vis-ocr_虚拟人物助理 idl-video_虚拟人物助理', 'refresh_token': '25.04aa8a8bed93e026c3f4444a4934f9d0.315360000.1892552867.282335-17537195', 'session_secret': '46628ed024217dbbb9b1ce9d5aadf84e', 'expires_in': 2592000} SUCCESS WITH TOKEN: 24.e64058ba1ed4ead39d6f245d5d593457.2592000.1579784867.282335-17537195 ; EXPIRES IN SECONDS: 2592000 %E4%BD%9C%E4%B8%9A%E5%A5%BD%E5%A4%9A%E6%88%91%E5%A5%BD%E5%BC%80%E5%BF%83 test on Web Browserhttp://tsn.baidu.com/text2audio?tok=24.e64058ba1ed4ead39d6f245d5d593457.2592000.1579784867.282335-17537195&tex=%25E4%25BD%259C%25E4%25B8%259A%25E5%25A5%25BD%25E5%25A4%259A%25E6%2588%2591%25E5%25A5%25BD%25E5%25BC%2580%25E5%25BF%2583&per=4&spd=5&pit=5&vol=5&aue=3&cuid=123456PYTHON&lan=zh&ctp=1 result saved as :result.mp3 ``` #### 使用后风险报告 - > **百度语音识别API错误率评估** 百度研发出了基于多层单向LSTM(长短时记忆模型)的汉语声韵母整体建模技术,并成功把连接时序分类(CTC)训练技术嵌入到语音识别传统技术建模框架中。该技术能够使机器的语音识别相对错误率降低15%,使汉语安静环境普通话语音识别的准确率接近97%。[点击此处阅读原文](http://news.zol.com.cn/549/5499324.html) - 可能出现的错误 1. 粤语语音输入**luoyou**(屁股的意思),输出普通话的结果为**柚子**(粤语柚子的发音为“luyou”) 2. 上传文本一键翻译,遇上生僻字**叅**无法准确翻译 3. 用户选择粤语翻译普通话,在语音输入时输入了普通话,选择错误,系统自动将输入的普通话翻译成了普通话 - 产品定价 ![输入图片说明](https://images.gitee.com/uploads/images/2019/1224/223329_1af6f36e_1648162.png "费用.PNG") ## 一句话版本 一键让你变成广东人 ## 一分钟版本 粤语因发音好听等原因被人所喜爱,广东经济一直都位居前列,因此许多人会选择来广东工作,在广东长居需要懂得一点粤语才能真正体会到广东的风土人情,这款APP就为在广东长居的非广东人提供了学习粤语的机会,本产品的核心功能为翻译器,可实时获得翻译内容,普通话与粤语相互转化。除此之外,还有提供学习的功能,如看剧,纠正发音,听粤语歌等。主要使用到的API有语音识别功能,通过语音输入获得翻译。