A python 2.7 and 3.4+ module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object
pdftoppm and pdftocairo are the piece of software that do the actual magic. It is distributed as part of a greater package called poppler.
pip
Windows users will have to install poppler for Windows, then add the bin/
folder to PATH.
Mac users will have to install poppler for Mac.
Linux users will have both tools pre-installed with Ubuntu 16.04+ and Archlinux. If it's not, run sudo apt install poppler-utils
conda
conda install -c conda-forge poppler
pip install pdf2image
Install Pillow
if you don't have it already with pip install pillow
from pdf2image import convert_from_path, convert_from_bytes
from pdf2image.exceptions import (
PDFInfoNotInstalledError,
PDFPageCountError,
PDFSyntaxError
)
Then simply do:
images = convert_from_path('/home/belval/example.pdf')
OR
images = convert_from_bytes(open('/home/belval/example.pdf', 'rb').read())
OR better yet
import tempfile
with tempfile.TemporaryDirectory() as path:
images_from_path = convert_from_path('/home/belval/example.pdf', output_folder=path)
# Do something here
images
will be a list of PIL Image representing each page of the PDF document.
Here are the definitions:
convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None)
convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None)
single_file
parameter allows you to convert the first PDF page only, without adding digits at the end of the output_file
poppler_path
convert_from_bytes()
(Thank you @FabianUken)fmt='tiff'
parameter allows you to create .tiff files (You need pdftocairo for this)transparent
parameter allows you to generate images with no background instead of the usual white one (You need pdftocairo for this)strict
parameter allows you to catch pdftoppm syntax error with a custom type PDFSyntaxError
use_cropbox
parameter allows you to use the crop box instead of the media box when converting (-cropbox
in pdftoppm's CLI)python tests.py
to get timings.此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。