449 Star 3.5K Fork 851

PaddlePaddle / PaddleOCR

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
datasets_en.md 4.03 KB
一键复制 编辑 原始数据 按行查看 历史
xxxpsyduck 提交于 2020-06-24 17:30 . update docs

DATASET

This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~

In addition to opensource data, users can also use synthesis tools to synthesize data themselves. Current available synthesis tools include text_renderer, SynthText, TextRecognitionDataGenerator, etc.

1. ICDAR2019-LSVT

2. ICDAR2017-RCTW-17

  • Data sourceshttps://rctw.vlrlab.net/
  • Introduction:It contains 12000 + images, most of them are collected in the wild through mobile camera. Some are screenshots. These images show a variety of scenes, including street views, posters, menus, indoor scenes and screenshots of mobile applications.
  • Download linkhttps://rctw.vlrlab.net/dataset/

3. Chinese Street View Text Recognition

  • Data sourceshttps://aistudio.baidu.com/aistudio/competition/detail/8

  • Introduction:A total of 290000 pictures are included, of which 210000 are used as training sets (with labels) and 80000 are used as test sets (without labels). The dataset is collected from the Chinese street view, and is formed by by cutting out the text line area (such as shop signs, landmarks, etc.) in the street view picture. All the images are preprocessed: by using affine transform, the text area is proportionally mapped to a picture with a height of 48 pixels, as shown in the figure:


    (a) Label: 魅派集成吊顶

    (b) Label: 母婴用品连锁

  • Download link https://aistudio.baidu.com/aistudio/datasetdetail/8429

4. Chinese Document Text Recognition

5、ICDAR2019-ArT

Python
1
https://gitee.com/paddlepaddle/PaddleOCR.git
git@gitee.com:paddlepaddle/PaddleOCR.git
paddlepaddle
PaddleOCR
PaddleOCR
release/2.2

搜索帮助