1 Star 0 Fork 1

kernelstudio/Dify API With OCR

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

Dify 文档提取器支持自建OCR服务

Dify工作流编码中的文档提取器默认不支持提取扫描版PDF文件,为了数据安全,需本地搭建OCR文档识别服务 并修改Dify相关源码.

注意: 暂时只支持如下版本

1. 添加OCR配置

首先创建OcrConfig, 修改 api/configs/app_config.py

# 头部导入ocr相关配置
from .ocr import OcrConfig


# 修改如下位置代码
class DifyConfig(
    # 添加ocr配置
    OcrConfig,
    # 其他配置
)

2. 修改文档提取器相关代码

添加以下ocr服务调用方法, 具体查看

import requests


def _ocr_extract_pdf(file_content: bytes) -> str:
    if dify_config.OCR_SERVICE_ENABLED and dify_config.OCR_SERVICE_URL:
        logger.info("Ocr pdf file")
        try:
            doc_file = io.BytesIO(file_content)
            files = {'file': (
                'ocr.pdf',  # 文件名
                doc_file,  # 文件流
                'application/pdf',  # 请求头Content-Type字段对应的值
                {'Expires': '0'})
            }
            response = requests.post(dify_config.OCR_SERVICE_URL, files=files)
            return response.json().get('text')
        except Exception as e:
            logger.error(e)
            return None
    return None

找到 _extract_text_from_pdf 方法修改为如下代码:

def _extract_text_from_pdf(file_content: bytes) -> str:
    # 首先调用ocr服务
    text = _ocr_extract_pdf(file_content)
    if text is not None:
        return text
    else:
        try:
            pdf_file = io.BytesIO(file_content)
            pdf_document = pypdfium2.PdfDocument(pdf_file, autoclose=True)
            text = ""
            for page in pdf_document:
                text_page = page.get_textpage()
                text += text_page.get_text_range()
                text_page.close()
                page.close()
            return text
        except Exception as e:
            raise TextExtractionError(f"Failed to extract text from PDF: {str(e)}") from e

3. 修改 .env

找到 dify/docker 目录下的 .env, 在文件开始位置添加如下配置:

# 启用ocr服务
OCR_SERVICE_ENABLED=true
# ocr服务地址
OCR_SERVICE_URL=http://ocr-service/api/v1/open/service/ocr

4. 修改 docker-compose.yaml

修改 dify/docker/docker-compose.yaml

头部添加如下配置

x-shared-env: &shared-api-worker-env
  OCR_SERVICE_ENABLED: ${OCR_SERVICE_ENABLED:-false}
  OCR_SERVICE_URL: ${OCR_SERVICE_URL:-}

将此文件中的 image: langgenius/dify-api:1.3.1 修改为定制后的镜像名称 image: langgenius/dify-api-with-ocr:1.3.1

5. 制作镜像

sh build.sh

6. 重启服务

# 切换到具体的dify目录
cd dify/docker

docker-compose down api worker
docker-compose up -d api worker
MIT License Copyright (c) 2025 Kernel Studio Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

Dify 文档提取器支持自建OCR服务 展开 收起
README
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/kernelstudio/dify-api-with-ocr.git
git@gitee.com:kernelstudio/dify-api-with-ocr.git
kernelstudio
dify-api-with-ocr
Dify API With OCR
master

搜索帮助