2 Star 0 Fork 0

mirrors_adobe/pdftools-extract-java-sdk-samples

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

Samples for the Adobe PDFTools Extract Java SDK

This sample project helps you get started with the PDFTools extract SDK.

The sample classes illustrate how to perform PDF-related extraction (extracting content of PDF in user friendly structured format) using the SDK.

Prerequisites

The sample application has the following requirements:

  • Java JDK : Version 8 or above.
  • Build Tool: The application requires Maven to be installed. Maven installation instructions can be found here.

Authentication Setup

The api credentials file and corresponding private key file for the samples is pdftools-api-credentials.json and private.key respectively. Before the samples can be run, replace both the files with the ones present in the zip file received via Beta Program Access workflow.

The SDK also supports providing the authentication credentials at runtime, without storing them in a config file. Please refer this section to know more.

Build with maven

Run the following command to build the project:

mvn clean install

Note that the PDFTools Extract SDK is listed as a dependency in the pom.xml and will be downloaded automatically.

A Note on Logging

For logging, this SDK uses the slf4j API with a log4j2-slf4j binding. The logging configurations are provided in src/main/resources/log4j2.properties. Alternate bindings, if required, can be specified in pom.xml.

Structured Information Output Format

The output of SDK extract operation is Zip package. The Zip package consists of following:

  • The structuredData.json file with the extracted content & PDF element structure. See the JSON schema.
  • A renditions folder(s) containing renditions for each element type selected as input. The folder name is either “tables” or “figures” depending on your specified element type. Each folder contains renditions with filenames that correspond to the element information in the JSON file.

Running the samples

The following sub-sections describe how to run the samples. Prior to running the samples, check that the credentials file is set up as described above and that the project has been built.

The code itself is in the com.adobe.platform.operation.samples.extractpdf package under the src/main/java/ folder. Test files used by the samples can be found in src/main/resources/. When executed, all samples create an output child folder under the working directory to store their results.

Extract PDF Elements from PDF Document

These samples illustrate how to extract PDF elements from PDF. Refer to the documentation of ExtractPDFOperation.java to see the list of inputs.

Extract Text Elements

The sample class ExtractTextInfoFromPDF.java extracts text elements from PDF Document.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDF

Extract Text, Table Elements

The sample class ExtractTextTableInfoFromPDF extracts text, table elements from PDF Document.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoFromPDF

Extract Text, Table Elements with Renditions of Table Elements

The sample class ExtractTextTableInfoWithRenditionsFromPDF extracts text, table elements along with table renditions from PDF Document. Note that the output is a zip containing the structured information along with renditions as described in section.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithRenditionsFromPDF

Extract Text, Table Elements with Renditions of Figure, Table Elements

The sample class ExtractTextTableInfoWithFiguresTablesRenditionsFromPDF extracts text, table elements along with figure and table element's renditions from PDF Document. Note that the output is a zip containing the structured information along with renditions as described in section.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithFiguresTablesRenditionsFromPDF

Extract Text Elements (By providing in-memory Authentication credentials)

The sample class ExtractTextInfoFromPDFWithInMemoryAuthCredentials.java extracts text elements from PDF Document. This sample highlights how to provide in-memory auth credentials for performing an operation. This enables the clients to fetch the credentials from a secret server during runtime, instead of storing them in a file.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoFromPDFWithInMemoryAuthCredentials

Extract Text Elements and bounding boxes for Characters present in text blocks

The sample class ExtractTextInfoWithCharBoundsFromPDF extracts text elements and bounding boxes for characters present in text blocks. Note that the output is a zip containing the structured information along with renditions as described in section.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextInfoWithCharBoundsFromPDF

Extract Text, Table Elements and bounding boxes for Characters present in text blocks with Renditions of Table Elements

The sample class ExtractTextTableInfoWithCharBoundsFromPDF extracts text, table elements, bounding boxes for characters present in text blocks and table element's renditions from PDF Document. Note that the output is a zip containing the structured information along with renditions as described in section.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithCharBoundsFromPDF

Extract Text, Table Elements with Renditions and CSV's of Table Elements

The sample class ExtractTextTableInfoWithTableStructureFromPdf extracts text, table elements, table structures as CSV and table element's renditions from PDF Document. Note that the output is a zip containing the structured information along with renditions as described in section.

mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.platform.operation.samples.extractpdf.ExtractTextTableInfoWithTableStructureFromPdf

Contributing

Contributions are welcome! Read the Contributing Guide for more information.

Licensing

This project is licensed under the MIT License. See LICENSE for more information.

MIT License © Copyright 2020 Adobe. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

取消

发行版

暂无发行版

贡献者

全部

语言

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mirrors_adobe/pdftools-extract-java-sdk-samples.git
git@gitee.com:mirrors_adobe/pdftools-extract-java-sdk-samples.git
mirrors_adobe
pdftools-extract-java-sdk-samples
pdftools-extract-java-sdk-samples
master

搜索帮助