1 Star 2 Fork 2

刘彬彬/小知识_自动化任务脚本_DhanushNehru

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

PDF to Text Converter

This project is a Python tool designed to convert PDF files into clean and readable text. It is built to extract text from both local and remote PDFs, perform post-processing to improve readability, and save the formatted content into .txt files. The project also includes features for downloading PDFs from URLs and cleaning up the extracted text to prevent issues with line breaks and disorganized spacing.


Features

  1. Text Extraction from Local and Remote PDFs:
    • Supports PDF files stored locally and PDFs available via URL.
  2. Text Cleaning and Formatting:
    • Removes unwanted line breaks and excessive spacing.
    • Preserves paragraphs and maintains the original structure.
  3. Saving Extracted Text as .txt Files:
    • The extracted text can be saved as a .txt file with the same name as the original PDF.
  4. Automatic Output Folder Creation:
    • Organizes generated text files into an output_texts folder for easy navigation and future use.

Requirements

Make sure to have the following libraries installed:

  • requests
  • PyPDF2

If you do not have them yet, install them using:

pip install requests PyPDF2
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/liu_binbin/some_python_scripts.git
git@gitee.com:liu_binbin/some_python_scripts.git
liu_binbin
some_python_scripts
小知识_自动化任务脚本_DhanushNehru
master

搜索帮助