iTranslated by AI
Dify × PDF: A Simple Solution for Enabling AI to Read Scanned PDFs
What I Created
I've created a Dify plugin called PDF to Images Converter Plugin!
It has been officially registered on the marketplace, so I'm writing this article to introduce the plugin and explain some conventional issues that couldn't be fully covered on the marketplace page.
Introduction
This plugin is intended for scenarios in Dify where you load a PDF and pass its content to an AI model for some kind of processing.
In such cases, PDFs can be broadly classified into the following two types:
(I believe care is needed on this point whenever handling PDFs in systems or programs, not just in Dify, such as when using Python libraries.)
-
Text-embedded PDF
PDFs where text information is embedded inside. Essentially, these are ones where you can copy text with your mouse.
(You can select text as shown in the "Write for yourself" section below.)

A saved version of "What is Zenn" as "What is Zenn?.pdf" -
Scanned PDF (Image-based PDF, Non-text-embedded PDF, etc.)
PDFs where there is no text information inside, and the entire page is stored as an image. In this case, since the text cannot be read directly, OCR or image recognition is required.
(Since I couldn't find a formal name even after asking AI, I'll call it a "Scanned PDF.")
Conventional Issues
Before discussing the solution, let me share the issues encountered with each pattern.
When processing PDFs in Dify, the standard approach is to use the "File Upload"[1] feature and the "Document Extractor"[2] node.
For Text-embedded PDFs, standard features are sufficient
Below is a simple Dify workflow for reading "What is Zenn?.pdf".
As shown in the "text" section at the bottom center, you can see that the PDF content is being read successfully.

If you would like to use even this minimal configuration, please import and use the following.
test.yml
app:
description: ''
icon: 🤖
icon_background: '#FFEAD5'
mode: advanced-chat
name: test
use_icon_as_answer_icon: false
dependencies:
- current_identifier: null
type: marketplace
value:
marketplace_plugin_unique_identifier: langgenius/openai:0.2.3@5a7f82fa86e28332ad51941d0b491c1e8a38ead539656442f7bf4c6129cd15fa
kind: app
version: 0.3.1
workflow:
conversation_variables: []
environment_variables: []
features:
file_upload:
allowed_file_extensions:
- .JPG
- .JPEG
- .PNG
- .GIF
- .WEBP
- .SVG
allowed_file_types:
- image
- document
allowed_file_upload_methods:
- local_file
enabled: true
fileUploadConfig:
audio_file_size_limit: 50
batch_count_limit: 5
file_size_limit: 15
image_file_size_limit: 10
video_file_size_limit: 100
workflow_file_upload_limit: 10
image:
enabled: false
number_limits: 3
transfer_methods:
- local_file
- remote_url
number_limits: 3
opening_statement: ''
retriever_resource:
enabled: true
sensitive_word_avoidance:
enabled: false
speech_to_text:
enabled: false
suggested_questions: []
suggested_questions_after_answer:
enabled: false
text_to_speech:
enabled: false
language: ''
voice: ''
graph:
edges:
- data:
isInIteration: false
isInLoop: false
sourceType: start
targetType: document-extractor
id: 1754876217921-source-1756044569689-target
source: '1754876217921'
sourceHandle: source
target: '1756044569689'
targetHandle: target
type: custom
zIndex: 0
- data:
isInIteration: false
isInLoop: false
sourceType: document-extractor
targetType: llm
id: 1756044569689-source-1756044579087-target
source: '1756044569689'
sourceHandle: source
target: '1756044579087'
targetHandle: target
type: custom
zIndex: 0
- data:
isInIteration: false
isInLoop: false
sourceType: llm
targetType: answer
id: 1756044579087-source-1756044632703-target
source: '1756044579087'
sourceHandle: source
target: '1756044632703'
targetHandle: target
type: custom
zIndex: 0
nodes:
- data:
desc: ''
selected: false
title: 開始
type: start
variables: []
height: 54
id: '1754876217921'
position:
x: 429.3783617376292
y: 23.778184703671002
positionAbsolute:
x: 429.3783617376292
y: 23.778184703671002
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
desc: ''
is_array_file: true
selected: false
title: テキスト抽出
type: document-extractor
variable_selector:
- sys
- files
height: 94
id: '1756044569689'
position:
x: 476.8571428571429
y: 105.35714285714283
positionAbsolute:
x: 476.8571428571429
y: 105.35714285714283
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
context:
enabled: false
variable_selector: []
desc: ''
model:
completion_params:
temperature: 0.7
mode: chat
name: gpt-4o-mini
provider: langgenius/openai/openai
prompt_template:
- id: 4a44c71b-bde2-4500-998d-7a159fb9c46d
role: system
text: ''
- id: 949bf395-1a93-4c3c-b7c8-58040033ede2
role: user
text: '{{#sys.query#}}\n\n\n {{#1756044569689.text#}}'
selected: false
title: LLM
type: llm
variables: []
vision:
enabled: false
height: 90
id: '1756044579087'
position:
x: 554.4313111598581
y: 225.6386357299544
positionAbsolute:
x: 554.4313111598581
y: 225.6386357299544
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
- data:
answer: '{{#1756044579087.text#}}'
desc: ''
selected: false
title: 回答
type: answer
variables: []
height: 105
id: '1756044632703'
position:
x: 609.7149753261024
y: 338.058419526812
positionAbsolute:
x: 609.7149753261024
y: 338.058419526812
selected: false
sourcePosition: right
targetPosition: left
type: custom
width: 244
viewport:
x: -451.4914715123374
y: 204.4258469818957
zoom: 1.0051611574364074
Scanned PDFs cannot be read with standard features
On the other hand, with "What is Zenn?_scan.pdf", which was saved as a scan type of the same page, the "text" field is blank, showing that information could not be retrieved. Given the specifications of document extraction, this is unavoidable.

Content of the scanned PDF. Text cannot be copied here.

Text cannot be extracted from the scanned PDF using the Document Extractor node.
Solution
Here is how it looks using the Dify plugin I created, PDF to Images Converter Plugin.
The content of the scanned PDF "What is Zenn?_scan.pdf" is passed to the AI (LLM node) as an image, and it correctly provides an answer about the content.

The converted image "What is Zenn?_scan_page_1.png" is being used as input for the AI.
In the "LLM" node for AI processing, the image file created by the plugin is specified in the "Vision" section. By doing this, rather than the developer extracting text from the image, the AI's own OCR capabilities are utilized to read the content.

Configured at the bottom in the format "(x) pdf conversion tool / (x) files Array[Files]".
How to Use
You can easily install and use it from the marketplace by following these steps:
- Log in to the Dify Cloud version
- Install pdf-to-images from the Plugin Marketplace
- Create a workflow by referring to the image and yml file below

Image from GitHub
By importing the following file after creating the flow, you can easily replicate the node layout shown in the GitHub image, so please give it a try!
This goes a step beyond what was shown in the Solution section, providing a workflow where the AI can read and process the content regardless of whether it's an image, a text-embedded PDF, or a scanned PDF!
Summary
- PDFs can be divided into "Text-embedded PDFs" and "Scanned PDFs" *My own terminology
- Text-embedded PDFs can be easily processed with the "Document Extractor node"
- Scanned PDFs use the pdf-to-images plugin + LLM's vision feature as an OCR alternative
With this, the AI can read the content no matter what kind of PDF you upload! 🥳 🙌
Discussion