# Add OCR Text

**Repository Path**: ylxdxx/Add_OCR_Text

## Basic Information

- **Project Name**: Add OCR Text
- **Description**: Add OCR Text
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-12-13
- **Last Updated**: 2025-01-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Add OCR Text


## 简介

当前各种 OCR 引擎识别图片时，可返回带坐标的文字，本程序主要是将图片封装到 PDF 文档中，并将 OCR 文字添加上，合成为可供搜索的 PDF 文件，为保证速度，用 `C++` 开发。


## 说明

基本用法：

``` 
add_ocr_text -i ocr.txt -f OpenSans-Regular.ttf
```

最少需要指定两个参数：

- `ocr.txt` 是带图片、文字、坐标信息的文本文件，格式如下

  ```
  22.png	Graduate	263,172	337,172	337,159	263,159
  22.png	School	345,172	398,172	398,159	345,159
  22.png	of	404,176	420,176	420,159	404,159
  22.png	Science,	426,174	489,174	489,159	426,159
  33.png	C++教程	94,273	168,273	168,254	94,254
  33.png	HTML/CSS	1537,275	1625,275	1625,258	1537,258
  33.png	C++简介	92,315	168,315	168,296	92,296
  33.png	学习园地	462,313	521,313	521,298	462,298
  ```

  共 6 列，每列间用 `Tab` 分隔。

  第一列是相关图片信息，若不在当前程序执行目录下，需要补全图片的相对或绝对路径信息，图片支持 `jpg`、`png`、`tif`三种格式。第二列是对应文字信息，第三到六列对应矩形四个点的坐标信息，对应的坐标依次为：左下、右下、右上、左上，每对坐标用英文逗号分隔，即 `x,y`。

- `OpenSans-Regular.ttf` 是需要指定的字体文件，以此字体文件写入 PDF 中的文字信息，字体的大小与复杂成度，会直接影响生成 PDF 文件的速度

注意，不同的 OCR 引擎对应的坐标原点不同，默认坐标原点的处理与百度 OCR 服务一致，以左上角为坐标原点，即参数 `-l, --location`  的默认值为 `4`，若使用的 OCR 引擎的坐标原点有变化，请通过 `-l, --location`  来指定：左下为`1`、右下为`2`、右上为`3`、左上为`4`。

全部可选参数如下：

```
Usage: add_ocr_text [--help] [--version] --input VAR --font VAR [--output VAR] [--location VAR] [--page VAR] [--box] [--tuning VAR...]

Optional arguments:
  -h, --help      shows help message and exits 
  -v, --version   prints version information and exits 
  -i, --input     specify the input text file. [required]
  -f, --font      specify the font file. [required]
  -o, --output    specify the output PDF file. [default: "out.pdf"]
  -l, --location  Specify the origin location of the image.
                  lower-left:1  lower-right:2  upper-right:3  upper-left:4   [default: 4]
  -p, --page      Specify page size, such as A3, A4, A5, B3, B4, B5. [default: "A4"]
  -b, --box       Draw a rectangular text box to fine tune its size. 
  -t, --tuning    Fine tune the position of the text box in pixels, x y. [nargs: 2] [default: {0 0}]

```


## 依赖

本程序，PDF 处理依赖于项目 [PDF-Writer](https://github.com/galkahana/PDF-Writer) 「版本 [v4.6.7](https://github.com/galkahana/PDF-Writer/releases/tag/v4.6.7)」，参数处理依赖于项目 [argparse](https://github.com/p-ranav/argparse) 「版本 [v3.1](https://github.com/p-ranav/argparse/tree/v3.1)」

相关依赖都已包含在项目中，在 `local/` 目录下，不需要再额外安装，可直接在程序中调用


## 待办

- 支持标准输入输出，添加标准错误输出
- 考虑非平行文字，加入旋转矩阵
- 更多的常见纸张和屏幕大小
- 自定义页面大小
- 页面大小自适应图片
- 页边距支持
- 更多的图片对齐方式（上、中、下、左、中、右）
- 反向导出图片
- 反向导出文字+坐标
- 引入配制文件
- 引入精简字体（英文+7000常用字）
- 添加常用 OCR 接口支持
  - 百度
  - 合合
  - 有道「性价比高」
  - 阿里
  - 一些常用本地 OCR 服务
- 添加非内嵌字体支持
- 写入速度优化