This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~
In addition to opensource data, users can also use synthesis tools to synthesize data themselves. Current available synthesis tools include text_renderer, SynthText, TextRecognitionDataGenerator, etc.
Data sources:https://ai.baidu.com/broad/introduction?dataset=lsvt
Introduction: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
(a) Fully labeled data
(b) Weakly labeled data
Download link:https://ai.baidu.com/broad/download?dataset=lsvt
Data sources:https://aistudio.baidu.com/aistudio/competition/detail/8
Introduction:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:
(a) Label: 魅派集成吊顶
(b) Label: 母婴用品连锁
Download link https://aistudio.baidu.com/aistudio/datasetdetail/8429
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。