# Draft
**Repository Path**: cuicheng01/draft
## Basic Information
- **Project Name**: Draft
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-05-19
- **Last Updated**: 2025-05-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
English | [简体中文](./readme_c.md)| [日本語](./README_ja.md)
[](https://github.com/PaddlePaddle/PaddleOCR)
[](./LICENSE)
[](https://pypi.org/project/PaddleOCR/)
[](https://discord.gg/z9xaRVjdbD)
[](https://x.com/PaddlePaddle)


[](https://www.paddleocr.ai/)
[](https://aistudio.baidu.com/community/app/91660/webUI)
[](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR)
[](https://www.modelscope.cn/organization/PaddlePaddle)
[](https://arxiv.org/pdf/2206.03001)
## 🚀 Introduction
Built on years of foundational research and real-world industry practice, PaddleOCR offers state-of-the-art solutions including the [PP-OCR](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/doc/doc_ch/ppocr_introduction.md) series of models, the document parsing system [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/ppstructure/README_ch.md), and the key information extraction tool [PP-ChatOCR](https://aistudio.baidu.com/aistudio/projectdetail/6488689), all powered by [paddlepaddle](https://github.com/PaddlePaddle/Paddle). Our models and tools are continuously updated to ensure **high accuracy**, **flexibility**, and **easy of use**. Additionally, users can annotate their own images using [PPOCRLabelv2](https://github.com/PFCCLab/PPOCRLabel) and [fine-tune](https://github.com/PaddlePaddle/PaddleOCR/blob/v2.7.0/doc/doc_ch/finetune.md) models with just a single command.
You can [Quick Start](#-quick-start) directly, find comprehensive documentation in the [PaddleOCR Docs](https://paddlepaddle.github.io/PaddleOCR/main/index.html), get support via [Github Issus](https://github.com/PaddlePaddle/PaddleOCR/issues), and explore our OCR courses on [OCR courses on AIStudio](https://aistudio.baidu.com/course/introduce/25207).
## 🌐 Architecture Overview
**PaddleOCR** is a modular OCR toolkit that offers ready-to-use models and solutions for OCR and document parsing. The latest offerings include:
- 🖼️[PP-OCRv5](): High-Precision Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
- 🧮[PP-StructureV3](): High-Precision Document Parsing Solution – Unleash SOTA Images/PDFs Parsing for Real-World Scenarios!
- 📈[PP-ChatOCRv4](): Intelligent Key Information Extraction Solution – Extract Key Information, not just text from Images/PDFs.
## 📣 Recent updates
🔥🔥2025.05.30: Release of **PaddleOCR v3.0**, including:
- **PP-OCRv5**: High-Precision Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
1. 🌐 Simultaneous Support for **5** types of text - Seamlessly process **Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English** and **Japanse** within a single model.
2. 🎯 Elevated Overall Text Recognition Accuracy - Achieves SOTA precision across diverse use cases.
3. ✍️ Revolutionized **Handwritten Text Recognition** - Delivers breakthrough performance for irregular, cursive, and complex scripts.
- **PP-StructureV3**: High-Precision Document Parsing Solution – Unleash SOTA Images/PDFs Parsing for Real-World Scenarios!
1. 🧮 Enables multi-scenario high-precision PDF parsing, achieving SOTA accuracy on the OmniDocBench benchmark among open-source solutions.
2. ⚡ Supports **multi-GPU parallel inference** and multi-GPU instance service deployment:
- High-precision configuration achieves **XXX** QPS on 4×V100 GPUs.
- High-efficiency configuration achieves **XXX** QPS on 4×V100 GPUs.
3. 🧠 Advanced capabilities include **seal recognition, chart-to-table conversion, table recognition with nested formulas/images, vertical text document parsing, and complex table structure analysis**.
- **PP-ChatOCRv4**: Intelligent Key Information Extraction Solution – Extract Key Information, not just text from Images/PDFs.
1. 🔥 Delivers high-accuracy key information extraction from document files including PDF/PNG formats, surpassing PP-ChatOCRv3 by **15.7%** in accuracy
2. 🤝 Integrated with [PP-DocBeeV2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee), supports extracting key information from charts and images within documents
3. 💻 Supports local offline deployment of LLMs/MLLMs, and allows seamless integration of large language models deployed via tools like [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP), Ollama, vLLM into PP-ChatOCRv4
The history of updates
- 🔥🔥2025.03.07: Release of **PaddleOCR v2.10**, including:
- **12 new self-developed models:**
- **[Layout Detection series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/layout_detection.html)**(3 models): PP-DocLayout-L, M, and S -- capable of detecting 23 common layout types across diverse document formats(papers, reports, exams, books, magazines, contracts, etc.) in English and Chinese. Achieves up to **90.4% mAP@0.5** , and lightweight features can process over 100 pages per second.
- **[Formula Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/formula_recognition.html)**(2 models): PP-FormulaNet-L and S -- supports recognition of 50,000+ LaTeX expressions, handling both printed and handwritten formulas. PP-FormulaNet-L offers **6% higher accuracy** than comparable models; PP-FormulaNet-S is 16x faster while maintaining similar accuracy.
- **[Table Structure Recognition series](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_structure_recognition.html)**(2 models): SLANeXt_wired and SLANeXt_wireless -- newly developed models with **6% accuracy improvement** over SLANet_plus in complex table recognition.
- **[Table Classification](https://paddlepaddle.github.io/PaddleX/latest/en/module_usage/tutorials/ocr_modules/table_classification.html)**(1 model):
PP-LCNet_x1_0_table_cls -- an ultra-lightweight classifier for wired and wireless tables.
[Learn more](https://paddlepaddle.github.io/PaddleOCR/latest/en/update.html)
## ⚡ Quick Start
### 1. Run online demo without installation
[](https://www.paddleocr.ai/)
[](https://aistudio.baidu.com/community/app/91660/webUI)
[](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR)
[](https://www.modelscope.cn/organization/PaddlePaddle)
### 2. Installation
First, please install PaddlePaddle using the official [Installation Guide](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html).
Then, install the PaddleOCR toolkit.
#### 2.1 CPU 环境
```bash
# 1. Install PaddlePaddle
pip install paddlepaddle
# 2. Install PaddleOCR
pip install paddleocr
# 3. Self-check after installation is complete
paddleocr --version
```
#### 2.2 NVIDIA GPU 环境
```bash
# 1. Install the CUDA 11.8 version of paddlepaddle-gpu
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# Or install the CUDA 12.6 version of paddlepaddle-gpu
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# 2. Install PaddleOCR
pip install paddleocr
# 3. Self-check after installation is complete
paddleocr --version
```
#### 2.3 More AI Accelerators
[Huawei Ascend](README_en.md) | [Kunlunxin](README.md)| Adding more
### 3. Run inference by CLI
```bash
# Run PP-OCRv5 inference
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
# Run PP-StructureV3 inference
paddleocr PP-StructureV3 -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
# Run PP-ChatOCRv4 inference
paddleocr pp_chatocrv4_doc -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key
# Get more information about "paddleocr ocr"
paddleocr ocr --help
```
### 4. Run inference by API
#### 4.1 PP-OCRv5 Example
```python
from paddleocr import PaddleOCR
# Initialize PaddleOCR instance
ocr = PaddleOCR()
# Run OCR inference on a sample image
result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png")
# Visualize the results and save the JSON results
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
```
4.2 PP-StructureV3 Example
```python
from pathlib import Path
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# For Image
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png")
# Visualize the results and save the JSON results
for res in output:
res.print()
res.save_to_json(save_path="output")
res.save_to_markdown(save_path="output")
# For PDF File
input_file = "./your_pdf_file.pdf"
output_path = Path("./output")
output = pipeline.predict(input_file)
markdown_list = []
markdown_images = []
for res in output:
md_info = res.markdown
markdown_list.append(md_info)
markdown_images.append(md_info.get("markdown_images", {}))
markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)
mkd_file_path = output_path / f"{Path(input_file).stem}.md"
mkd_file_path.parent.mkdir(parents=True, exist_ok=True)
with open(mkd_file_path, "w", encoding="utf-8") as f:
f.write(markdown_texts)
for item in markdown_images:
if item:
for path, image in item.items():
file_path = output_path / path
file_path.parent.mkdir(parents=True, exist_ok=True)
image.save(file_path)
```
4.3 PP-ChatOCRv4 Example
```python
from paddleocr import PPChatOCRv4Doc
chat_bot_config = {
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "openai",
"api_key": "api_key", # your api_key
}
retriever_config = {
"module_name": "retriever",
"model_name": "embedding-v1",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "qianfan",
"api_key": "api_key", # your api_key
}
mllm_chat_bot_config = {
"module_name": "chat_bot",
"model_name": "PP-DocBee",
"base_url": "http://127.0.0.1:8080/", # your local mllm service url
"api_type": "openai",
"api_key": "api_key", # your api_key
}
pipeline = PPChatOCRv4Doc()
visual_predict_res = pipeline.visual_predict(
input="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/vehicle_certificate-1.png",
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_common_ocr=True,
use_seal_recognition=True,
use_table_recognition=True,
)
visual_info_list = []
for res in visual_predict_res:
visual_info_list.append(res["visual_info"])
layout_parsing_result = res["layout_parsing_result"]
vector_info = pipeline.build_vector(
visual_info_list, flag_save_bytes_vector=True, retriever_config=retriever_config
)
mllm_predict_res = pipeline.mllm_pred(
input="vehicle_certificate-1.png",
key_list=["驾驶室准乘人数"],
mllm_chat_bot_config=mllm_chat_bot_config,
)
mllm_predict_info = mllm_predict_res["mllm_res"]
chat_result = pipeline.chat(
key_list=["驾驶室准乘人数"],
visual_info=visual_info_list,
vector_info=vector_info,
mllm_predict_info=mllm_predict_info,
chat_bot_config=chat_bot_config,
retriever_config=retriever_config,
)
print(chat_result)
```
## 📚 Get Started From OCR Courses:
- [AI快车道2020-PaddleOCR](https://aistudio.baidu.com/course/introduce/1519)
## 😃 Awesome Projects Leveraging PaddleOCR
💗 PaddleOCR wouldn’t be where it is today without its incredible community! A massive 🙌 thank you 🙌 to all our longtime partners, new collaborators, and everyone who’s poured their passion into PaddleOCR — whether we’ve named you or not. Your support fuels our fire! 🔥
| Project Name | Description |
| ------------ | ----------- |
| [RAGFlow](https://github.com/infiniflow/ragflow)
|RAG engine based on deep document understanding.|
| [MinerU](https://github.com/opendatalab/MinerU)
|Multi-type Document to Markdown Conversion Tool|
| [Umi-OCR](https://github.com/hiroi-sora/Umi-OCR)
|Free, Open-source, Batch Offline OCR Software.|
| [OmniParser](https://github.com/microsoft/OmniParser)
|OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent.|
| [QAnything](https://github.com/netease-youdao/QAnything)
|Question and Answer based on Anything.|
| [PDF-Extract-Kit](https://github.com/opendatalab/PDF-Extract-Kit)
|A powerful open-source toolkit designed to efficiently extract high-quality content from complex and diverse PDF documents.|
| [Dango-Translator](https://github.com/PantsuDango/Dango-Translator)
|Recognize text on the screen, translate it and show the translation results in real time.|
| [Learn more projects](./awesome_projects.md) | [More projects based on PaddleOCR](./awesome_projects.md)|
## 👩👩👧👦 Community
* 👫 Join the [PaddlePaddle Community](https://github.com/PaddlePaddle/community), where you can engage with [paddlepaddle developers](https://www.paddlepaddle.org.cn/developercommunity), researchers, and enthusiasts from around the world.
* 🎓 Learn from experts through workshops, tutorials, and Q&A sessions [hosted by the AI Studio](https://aistudio.baidu.com/learn/center).
* 🏆 Participate in [hackathons, challenges, and competitions](https://aistudio.baidu.com/competition) to showcase your skills and win exciting prizes.
* 📣 Stay updated with the latest news, announcements, and events by following our [Twitter](https://x.com/PaddlePaddle) and [WeChat](https://mp.weixin.qq.com/s/MAdo7fZ6dfeGcCQUtRP2ag).
Let’s build the future of AI together! 🚀
## 📄 License
This project is released under [Apache License Version 2.0](./LICENSE).