iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🏊

Building a Custom Block to Read PDF Files in Dify

に公開

Hello everyone, are you using Dify? Dify is extremely convenient because it allows you to easily create apps such as agents and RAG using low-code, and you can easily monitor the usage of the apps and the inner workings of the system!
One point of concern is that, as of this writing (June 2024), it does not support PDF file input. There are many situations in business where you want to analyze PDF files, right?
However, with a bit of ingenuity, you can create a block that can read PDF files from local or shared folders!

How it works

The mechanism is as follows:

  1. Input the file path into Dify
  2. Send the file path to a self-hosted PDF reading server
  3. The server analyzes the file, extracts the text, and returns it
  4. Start text analysis!

Let's get started!

0. Set up Dify locally

First, you need to set up Dify locally!
Please refer to other people's articles for this process.
https://zenn.dev/karaage0703/articles/6bcbaf37d6607d
https://zenn.dev/acntechjp/articles/79e4b4abfb2112
The file path must be accessible from the PC where Dify is hosted, so it needs to be on your own PC or a shared folder. I have set mine up using WSL.

1. Input the file path into Dify

Once Dify is set up, start creating an app using a workflow.

In the Start block, the field type is set to short text and the variable name is 'path'.

2. Send the file path to a self-hosted server

To send the file path to the server, use a Code block.

The input variable is the 'path' specified earlier.
Select requests in the advanced dependencies.
The code for Python 3 is as follows.
This code sends the path to the server; if the reading is successful on the server side, it retrieves the PDF text, and if it fails, it returns an error message.

def main(path: str):
    # Set the server URL
    server_url = 'http://XXX.XXX.XXX.XXX:5000/'

    # Send a POST request to the server and pass the file path
    response = requests.post(server_url, data={'path': path})

    # Check if the HTTP status code is 200 (Success)
    if response.status_code == 200:
        response_data = response.json()
        # Check if the status is 'success'
        if response_data['status'] == 'success':
            # Retrieve the text as the result
            result = response_data['text']
        else:
            # Retrieve the error message as the result
            result = f"Error: {response_data['message']}"
    else:
        # Error message for failed HTTP request
        result = f"Failed to upload file. Status code: {response.status_code}"
    
    # Return the result in dictionary format
    return {
        'result': result
    }
 

Please specify the IP address of the PDF reading server (explained in the next section) for server_url.
The output variable is result.

3. Server receives the file path, analyzes the file, and returns the text

You can set up the PDF reading server on the same PC where Dify is running.

Download the packages:

pip install flask request jsonify pypdf2

We will build a Flask server that reads PDFs using PyPDF2.
The code is as follows:

from flask import Flask, request, jsonify
from PyPDF2 import PdfReader

# Create an instance of the Flask application
app = Flask(__name__)

# Define the root endpoint and handle POST requests
@app.route('/', methods=['POST'])
def upload_file():
    response_data = {}
    pdf_path = request.form['path']  # Get the path of the PDF file sent from the client

    if pdf_path:  # Check if a PDF file path is provided
        try:
            with open(pdf_path, 'rb') as pdf_file:  # Read the PDF file from the specified path
                reader = PdfReader(pdf_file)
                text = ""
                for page in reader.pages:  # Extract text from all pages of the PDF
                    text += page.extract_text()
            response_data['status'] = 'success'  # Set the status if processing is successful
            response_data['text'] = text
        except Exception as e:  # Error handling
            response_data['status'] = 'error'
            response_data['message'] = str(e)
    else:
        response_data['status'] = 'error'  # Set the status if no file path is provided
        response_data['message'] = 'No file path provided'

    return jsonify(response_data)  # Return response data in JSON format

if __name__ == '__main__':
    # Start the application on its own IP (0.0.0.0) and port 5000
    app.run(host='0.0.0.0', port=5000, debug=True)

Run the server!

python XXX.py

4. Start text analysis!

From here, it's up to you to build the blocks however you like! Whether it's for text extraction or having an LLM read it... being able to load PDF files significantly expands the possibilities for your apps!

Summary

Thank you for reading this far. Handling PDF files is a common scenario in business. By applying the method shown here, you can extend Dify's functionality to support a wider variety of data sources. Please give it a try. Let's continue to deepen our learning for efficient app development with Dify.

I hope this blog post helps your projects. If you have any questions or feedback, please let me know in the comments. Happy developing!

Discussion