36 Star 242 Fork 75

PaddlePaddle/DeepSpeech

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

MagicData

MAGICDATA Mandarin Chinese Read Speech Corpus was developed by MAGIC DATA Technology Co., Ltd. and freely published for non-commercial use. The contents and the corresponding descriptions of the corpus include:

  • The corpus contains 755 hours of speech data, which is mostly mobile recorded data.
  • 1080 speakers from different accent areas in China are invited to participate in the recording.
  • The sentence transcription accuracy is higher than 98%.
  • Recordings are conducted in a quiet indoor environment.
  • The database is divided into training set, validation set, and testing set in a ratio of 51: 1: 2.
  • Detail information such as speech data coding and speaker information is preserved in the metadata file.
  • The domain of recording texts is diversified, including interactive Q&A, music search, SNS messages, home command and control, etc.
  • Segmented transcripts are also provided.

The corpus aims to support researchers in speech recognition, machine translation, speaker recognition, and other speech-related fields. Therefore, the corpus is totally free for academic use.

马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/paddlepaddle/DeepSpeech.git
git@gitee.com:paddlepaddle/DeepSpeech.git
paddlepaddle
DeepSpeech
DeepSpeech
develop

搜索帮助