💨

Clineを使ってDifyを解析してみた

に公開

Difyというノーコード・ローコードでAIアプリケーションを開発・運用できるオープンソースプラットフォームがあります。

オープンソースでapi側はPython(Flask)、フロント側はtypescript(Next.js)で書かれており、コードは700,000行以上ありました(2025-03-31現在、mainブランチをclocで計測)
https://github.com/langgenius/dify

ある程度以上のコード規模のリポジトリを、クラス図、ER図、状態遷移図などにしてどこまで解析できるのか? という所に興味があったのと、併せてDifiのシステム解析をしてAIアプリケーションがどのような構成でできているのか? の勉強も兼ねて、今回解析してみました。

モデルは最近出たGemini2.5 proを使ってみました。現時点で100万入力トークンが可能なAIモデルで、大規模コードの解析がどこまでできるかが見たかった為に試しています。

はじまりのプロンプト

まずは最初に以下のようなプロンプトで mermaid 形式のシステム図の出力を指示します。

apiディレクトリ以下のディレクトリとファイルの内容を確認して、ディレクトリ構成と以下のファイルをapi/docディレクトリにmermaid形式で作成してください。

・クラス図
・コンポーネント図(C4)
・データフロー図(フローチャート)
・状態遷移図
・ER図

後app/controllersのファイルを走査して、api定義をswagger形式で出力してください。

すると以下のようなファイル群がapi/docに出力されました。
尚、今回はこの先出力されたファイルを基本的に信頼して扱っていますが、あくまでAIによる解釈であり、厳密な正確性は別途検証が必要です。

  • class_diagram_extended.md
  • component_diagram_c4_context.md
  • directory_structure.md
  • er_diagram.md
  • state_diagram_workflow.md
  • main_api_spec.yaml

まずはシステムの全体像を把握するために er_diagram.md を確認していきます。

動かしながらの方が分かりやすいので、docker ディレクトリに移動して、docker compose up でアプリを起動します。

そのままではDBポートがローカルに開かれてなくてテーブルやデータの内容を確認できないので、docker-compose.ymlのdbサービスに以下のポート指定をして起動します。

    ports:
      - "5432:5432"

ER図

tenantを頂点に、それに紐づく形でapp、account、datasetなどが紐づいているようです。

difyでは画面から作成できる代表的なものに、ワークフロー、チャットボット、エージェントやナレッジがあります。

  • ワークフロー (Workflow):
    • 複数のAIモデル、ロジック(条件分岐など)、外部ツール(API呼び出しなど)、ナレッジ検索といった様々な「ノード」を視覚的につなぎ合わせることで、複雑なAI処理の流れを自由に設計・構築できる機能です。単純なプロンプト応答だけでなく、一連のタスクを自動化したり、複数のステップを経て最終的な出力を生成したりするような、より高度で柔軟なAIアプリケーションを作成するのに使われます。
  • チャットボット (Chatbot):
    • ユーザーと対話形式でやり取りを行うAIアプリケーションを作成するための機能です。基本的な質疑応答、特定の話題に関する会話、あるいは後述のナレッジと連携して特定の情報源に基づいた回答を行うなど、様々な対話型AIを比較的簡単に構築し、Webサイトへの埋め込みなどで利用できます。
  • エージェント (Agent):
    • AI(特にLLM)に特定の目標を与え、その目標達成のために自律的に計画を立て、必要なツール(API連携、Web検索、計算など)を選択・実行する能力を持たせることを目指した機能です。ユーザーからの指示を受け、複数のステップやツール利用を経てタスクを遂行するような、より能動的で自律性の高いAIアプリケーションを構築するために利用されます。
  • ナレッジ (Knowledge):
    • 独自のドキュメント(PDF、テキストファイル、Webサイトなど)や構造化データ(CSVなど)をアップロード・登録し、AIが参照できる知識ベースを作成する機能です。AIは、このナレッジ内の情報を検索・参照して回答を生成するため、一般的な学習データには含まれない、特定のドメイン知識や最新情報に基づいた応答(Retrieval-Augmented Generation, RAG と呼ばれる技術)が可能になります。

ワークフローとチャットボットについては workflows テーブルにデータが格納されます。

ワークフロー例)

node上に実行フローを設定でき、こちらはnodeデータとして、workflows.graphに設定されるようです。

エージェントは app_model_configs で、ナレッジは data_sets に保存されました。

ERの概要が分かればシステムをある程度俯瞰して眺められます。

コンポーネント図(C4)

WebUIをUserが操作して、Dify APIを介してDatabaseやWorker、各種Providerに接続されるのが図式化されています。

その中でも以下の項目について補足します。

  • Vector Database (ベクトルデータベース):
    • テキストなどのデータを意味的に検索可能な「ベクトル」形式で保存するデータベースです。
      Difyでは主にナレッジベース機能で使われ、ドキュメント検索を高速かつ高精度に行います。 (例: Weaviate, Qdrant)
  • Celery Worker(分散タスクキュー):
    • 時間のかかる処理(ナレッジのインデックス作成など)をバックグラウンドで非同期に実行するコンポーネント
  • Providers (プロバイダー):
    • Difyが連携する外部サービス群です。Difyはこれらの外部サービスAPIを呼び出して、高度な機能を実現します。
      • LLM Provider: テキスト生成 (例: OpenAI, Anthropic)
      • Embedding Provider: テキストのベクトル化
      • Moderation Provider: コンテンツフィルタリング
      • TTS Provider: テキスト読み上げ
      • ASR Provider: 音声認識

状態遷移図

こちらはワークフローがどのように作成されて実行されるか図式化されています。
まず下書きが作成、編集されて、次に公開。公開されたワークフローが実行されると処理中へとなり、成功、失敗、停止、部分成功などの状態へ遷移します。
他にも色々な状態遷移はあるでしょうが、今回はこちらだけの出力となりました。

API定義

swagger定義

swagger editorなどで下記をコピーして表示

openapi: 3.0.2
info:
  title: Dify API (Main Endpoints)
  version: '1.0'
  description: A selection of main API endpoints for Dify. This is not exhaustive.
servers:
  - url: /v1 # Assuming API base path is /v1, adjust if necessary
tags:
  - name: Authentication (Console)
    description: Console user authentication and related operations.
  - name: Apps (Console)
    description: Application management operations in the console.
  - name: App Execution (Web)
    description: Running applications via the web API.
  - name: Workflow (Console)
    description: Workflow creation, publishing, and debugging operations.
  - name: Datasets (Console)
    description: Dataset (Knowledge Base) management operations.

paths:
  /login:
    post:
      summary: Login
      description: Authenticate user with email and password.
      tags:
        - Authentication (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                email:
                  type: string
                  format: email
                  description: User's email address.
                password:
                  type: string
                  format: password
                  description: User's password.
                remember_me:
                  type: boolean
                  default: false
                invite_token:
                  type: string
                  nullable: true
                language:
                  type: string
                  default: "en-US"
              required:
                - email
                - password
      responses:
        '200':
          description: Login successful, returns access and refresh tokens.
          content:
            application/json:
              schema:
                type: object
                properties:
                  result:
                    type: string
                    example: success
                  data:
                    type: object
                    properties:
                      access_token:
                        type: string
                      refresh_token:
                        type: string
        '400':
          description: Bad Request (e.g., invalid email format, missing fields).
        '401':
          description: Unauthorized (e.g., invalid password, account banned, account not found and registration disabled).
        '429':
          description: Too Many Requests (Login rate limit exceeded).
  /logout:
    get:
      summary: Logout
      description: Logout the current user.
      tags:
        - Authentication (Console)
      responses:
        '200':
          description: Logout successful.
  /email-code-login:
    post:
      summary: Send Email Login Code
      description: Sends a verification code to the user's email for login.
      tags:
        - Authentication (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                email:
                  type: string
                  format: email
                language:
                  type: string
                  default: "en-US"
              required:
                - email
      responses:
        '200':
          description: Email sent successfully, returns a temporary token.
        '400':
          description: Bad Request.
        '404':
          description: Account not found (if registration is disabled).
        '429':
          description: Too Many Requests (Email sending rate limit exceeded).
  /email-code-login/validity:
    post:
      summary: Login with Email Code
      description: Logs in or registers a user using the email and verification code.
      tags:
        - Authentication (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                email:
                  type: string
                  format: email
                code:
                  type: string
                  description: Verification code sent via email.
                token:
                  type: string
                  description: Temporary token received from /email-code-login.
              required:
                - email
                - code
                - token
      responses:
        '200':
          description: Login/Registration successful, returns access and refresh tokens.
        '400':
          description: Bad Request (e.g., invalid code, invalid token, invalid email).
  /reset-password:
    post:
      summary: Send Reset Password Email
      description: Sends an email with instructions to reset the password.
      tags:
        - Authentication (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                email:
                  type: string
                  format: email
                language:
                  type: string
                  default: "en-US"
              required:
                - email
      responses:
        '200':
          description: Email sent successfully, returns a temporary token.
        '400':
          description: Bad Request.
        '404':
          description: Account not found (if registration is disabled).
        '429':
          description: Too Many Requests (Password reset rate limit exceeded).
  /refresh-token:
    post:
      summary: Refresh Access Token
      description: Obtain a new access token using a refresh token.
      tags:
        - Authentication (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                refresh_token:
                  type: string
              required:
                - refresh_token
      responses:
        '200':
          description: Token refreshed successfully.
        '401':
          description: Unauthorized (Invalid refresh token).

  /apps:
    get:
      summary: Get App List
      description: Retrieves a paginated list of applications for the current tenant.
      tags:
        - Apps (Console)
      parameters:
        - name: page
          in: query
          schema:
            type: integer
            default: 1
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
        - name: mode
          in: query
          schema:
            type: string
            enum: [chat, workflow, agent-chat, channel, all]
            default: all
        - name: name
          in: query
          schema:
            type: string
        - name: tag_ids
          in: query
          schema:
            type: string
            description: Comma-separated list of UUIDs.
        - name: is_created_by_me
          in: query
          schema:
            type: boolean
      responses:
        '200':
          description: A list of applications.
        '401':
          description: Unauthorized.
    post:
      summary: Create App
      description: Creates a new application.
      tags:
        - Apps (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                description:
                  type: string
                  nullable: true
                mode:
                  type: string
                  enum: [chat, agent-chat, advanced-chat, workflow, completion]
                icon:
                  type: string
                  nullable: true
                icon_background:
                  type: string
                  nullable: true
              required:
                - name
                - mode
      responses:
        '201':
          description: Application created successfully.
        '400':
          description: Bad Request (e.g., missing mode).
        '403':
          description: Forbidden (User does not have permission).

  /apps/{app_id}:
    parameters:
      - name: app_id
        in: path
        required: true
        schema:
          type: string
          format: uuid
    get:
      summary: Get App Detail
      description: Retrieves details for a specific application.
      tags:
        - Apps (Console)
      responses:
        '200':
          description: Application details.
        '401':
          description: Unauthorized.
        '404':
          description: App not found.
    put:
      summary: Update App
      description: Updates details for a specific application.
      tags:
        - Apps (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                description:
                  type: string
                  nullable: true
                icon:
                  type: string
                  nullable: true
                icon_background:
                  type: string
                  nullable: true
              required:
                - name
      responses:
        '200':
          description: Application updated successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.
    delete:
      summary: Delete App
      description: Deletes a specific application.
      tags:
        - Apps (Console)
      responses:
        '204':
          description: Application deleted successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/copy:
    post:
      summary: Copy App
      description: Copies an existing application to create a new one.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  nullable: true
                description:
                  type: string
                  nullable: true
                icon:
                  type: string
                  nullable: true
                icon_background:
                  type: string
                  nullable: true
      responses:
        '201':
          description: Application copied successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Source app not found.

  /apps/{app_id}/export:
    get:
      summary: Export App DSL
      description: Exports the application configuration in DSL (YAML) format.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
        - name: include_secret
          in: query
          schema:
            type: boolean
            default: false
      responses:
        '200':
          description: Application DSL.
          content:
            application/json: # Or potentially application/yaml
              schema:
                type: object
                properties:
                  data:
                    type: string # YAML content as a string
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/name:
    post:
      summary: Update App Name
      description: Updates the name of a specific application.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
              required:
                - name
      responses:
        '200':
          description: App name updated successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/icon:
    post:
      summary: Update App Icon
      description: Updates the icon of a specific application.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                icon:
                  type: string
                  nullable: true
                icon_background:
                  type: string
                  nullable: true
      responses:
        '200':
          description: App icon updated successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/site-enable:
    post:
      summary: Update App Site Status
      description: Enables or disables the public web app site for an application.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                enable_site:
                  type: boolean
              required:
                - enable_site
      responses:
        '200':
          description: App site status updated successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/api-enable:
    post:
      summary: Update App API Status
      description: Enables or disables API access for an application.
      tags:
        - Apps (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                enable_api:
                  type: boolean
              required:
                - enable_api
      responses:
        '200':
          description: App API status updated successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /completion-messages:
    post:
      summary: Run Completion App
      description: Executes a completion mode application.
      tags:
        - App Execution (Web)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                inputs:
                  type: object
                  description: Input variables for the app.
                query:
                  type: string
                  description: User query (optional, depends on app config).
                files:
                  type: array
                  items:
                    type: object # Define file object structure if known
                  nullable: true
                response_mode:
                  type: string
                  enum: [blocking, streaming]
                  description: |
                    'blocking' waits for the full response.
                    'streaming' returns chunks.
                user:
                  type: string
                  description: End user identifier.
              required:
                - inputs
                - user # Assuming user identifier is always required for web APIs
      responses:
        '200':
          description: Execution result (blocking or streaming).
        '400':
          description: Bad Request.
        '401':
          description: Unauthorized (Invalid API Key or user).
        '404':
          description: App not found or not a completion app.
        '500':
          description: Internal Server Error during execution.

  /completion-messages/{task_id}/stop:
    post:
      summary: Stop Completion Task
      description: Stops a running completion task.
      tags:
        - App Execution (Web)
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Stop signal sent successfully.
        '401':
          description: Unauthorized.
        '404':
          description: App not found or not a completion app.

  /chat-messages:
    post:
      summary: Run Chat/Agent/Workflow App
      description: Sends a message to a chat, agent, or advanced-chat mode application.
      tags:
        - App Execution (Web)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                inputs:
                  type: object
                  description: Input variables for the app.
                query:
                  type: string
                  description: User's message/query.
                files:
                  type: array
                  items:
                    type: object # Define file object structure if known
                  nullable: true
                response_mode:
                  type: string
                  enum: [blocking, streaming]
                conversation_id:
                  type: string
                  format: uuid
                  nullable: true
                  description: Existing conversation ID to continue, or null/omit to start new.
                user:
                  type: string
                  description: End user identifier.
              required:
                - inputs
                - query
                - user # Assuming user identifier is always required for web APIs
      responses:
        '200':
          description: Execution result (blocking or streaming).
        '400':
          description: Bad Request.
        '401':
          description: Unauthorized (Invalid API Key or user).
        '404':
          description: App/Conversation not found or not a chat-based app.
        '500':
          description: Internal Server Error during execution.

  /chat-messages/{task_id}/stop:
    post:
      summary: Stop Chat Task
      description: Stops a running chat/agent/workflow task.
      tags:
        - App Execution (Web)
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Stop signal sent successfully.
        '401':
          description: Unauthorized.
        '404':
          description: App not found or not a chat-based app.

  /apps/{app_id}/workflows/draft:
    parameters:
      - name: app_id
        in: path
        required: true
        schema:
          type: string
          format: uuid
    get:
      summary: Get Draft Workflow
      description: Retrieves the draft version of the workflow for an app.
      tags:
        - Workflow (Console)
      responses:
        '200':
          description: Draft workflow details.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App or draft workflow not found.
    post:
      summary: Sync Draft Workflow
      description: Saves or updates the draft workflow definition.
      tags:
        - Workflow (Console)
      requestBody:
        required: true
        content:
          application/json: # or text/plain containing JSON
            schema:
              type: object
              properties:
                graph:
                  type: object
                  description: Workflow graph structure (nodes, edges).
                features:
                  type: object
                  description: Workflow features configuration.
                environment_variables:
                  type: array
                  items:
                    type: object # Define variable structure
                  nullable: true
                conversation_variables:
                  type: array
                  items:
                    type: object # Define variable structure
                  nullable: true
                hash:
                  type: string
                  nullable: true
                  description: Optional hash for optimistic locking.
              required:
                - graph
                - features
      responses:
        '200':
          description: Draft workflow synced successfully.
        '400':
          description: Bad Request or Draft workflow conflict (hash mismatch).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.
        '415':
          description: Unsupported Media Type (if not JSON or text/plain with JSON).

  /apps/{app_id}/workflows/publish:
    parameters:
      - name: app_id
        in: path
        required: true
        schema:
          type: string
          format: uuid
    get:
      summary: Get Published Workflow
      description: Retrieves the currently published workflow for an app.
      tags:
        - Workflow (Console)
      responses:
        '200':
          description: Published workflow details.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App or published workflow not found.
    post:
      summary: Publish Workflow
      description: Publishes the current draft workflow.
      tags:
        - Workflow (Console)
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                marked_name:
                  type: string
                  maxLength: 20
                  nullable: true
                marked_comment:
                  type: string
                  maxLength: 100
                  nullable: true
      responses:
        '200':
          description: Workflow published successfully.
        '400':
          description: Bad Request (e.g., validation error).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App or draft workflow not found.

  /apps/{app_id}/workflows:
    get:
      summary: Get Published Workflow History
      description: Retrieves a paginated list of published workflow versions.
      tags:
        - Workflow (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
        - name: page
          in: query
          schema:
            type: integer
            default: 1
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
        - name: user_id
          in: query
          schema:
            type: string
            format: uuid
            nullable: true
        - name: named_only
          in: query
          schema:
            type: boolean
            default: false
      responses:
        '200':
          description: List of published workflows.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found.

  /apps/{app_id}/workflows/{workflow_id}:
    parameters:
      - name: app_id
        in: path
        required: true
        schema:
          type: string
          format: uuid
      - name: workflow_id
        in: path
        required: true
        schema:
          type: string
          format: uuid # Or potentially a version string
    patch:
      summary: Update Workflow Attributes
      description: Updates attributes (like marked name/comment) of a specific workflow version.
      tags:
        - Workflow (Console)
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                marked_name:
                  type: string
                  maxLength: 20
                  nullable: true
                marked_comment:
                  type: string
                  maxLength: 100
                  nullable: true
      responses:
        '200':
          description: Workflow updated successfully.
        '400':
          description: Bad Request (e.g., validation error).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Workflow not found.
    delete:
      summary: Delete Workflow Version
      description: Deletes a specific published workflow version (cannot delete draft this way).
      tags:
        - Workflow (Console)
      responses:
        '204':
          description: Workflow deleted successfully.
        '400':
          description: Bad Request (e.g., trying to delete draft or workflow in use).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Workflow not found.

  /apps/{app_id}/convert-to-workflow:
    post:
      summary: Convert App to Workflow Mode
      description: Converts a basic chat or completion app to advanced-chat or workflow mode.
      tags:
        - Workflow (Console)
      parameters:
        - name: app_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  nullable: true
                icon:
                  type: string
                  nullable: true
                icon_background:
                  type: string
                  nullable: true
      responses:
        '200':
          description: Conversion successful, returns the new app ID.
          content:
            application/json:
              schema:
                type: object
                properties:
                  new_app_id:
                    type: string
                    format: uuid
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: App not found or already in workflow mode.

  /datasets:
    get:
      summary: Get Dataset List
      description: Retrieves a paginated list of datasets (knowledge bases).
      tags:
        - Datasets (Console)
      parameters:
        - name: page
          in: query
          schema:
            type: integer
            default: 1
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
        - name: ids
          in: query
          schema:
            type: array
            items:
              type: string
              format: uuid
          style: form
          explode: false
        - name: keyword
          in: query
          schema:
            type: string
        - name: tag_ids
          in: query
          schema:
            type: array
            items:
              type: string
              format: uuid
          style: form
          explode: false
        - name: include_all
          in: query
          schema:
            type: boolean
            default: false
      responses:
        '200':
          description: A list of datasets.
        '401':
          description: Unauthorized.
    post:
      summary: Create Dataset
      description: Creates a new empty dataset (knowledge base).
      tags:
        - Datasets (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 40
                description:
                  type: string
                  maxLength: 400
                  nullable: true
                indexing_technique:
                  type: string
                  enum: [high_quality, economy, null]
                  nullable: true
                # Add fields for external knowledge if needed
              required:
                - name
      responses:
        '201':
          description: Dataset created successfully.
        '400':
          description: Bad Request (e.g., name validation failed, duplicate name).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.

  /datasets/{dataset_id}:
    parameters:
      - name: dataset_id
        in: path
        required: true
        schema:
          type: string
          format: uuid
    get:
      summary: Get Dataset Detail
      description: Retrieves details for a specific dataset.
      tags:
        - Datasets (Console)
      responses:
        '200':
          description: Dataset details.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Dataset not found.
    patch:
      summary: Update Dataset
      description: Updates details for a specific dataset.
      tags:
        - Datasets (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  minLength: 1
                  maxLength: 40
                  nullable: true
                description:
                  type: string
                  maxLength: 400
                  nullable: true
                indexing_technique:
                  type: string
                  enum: [high_quality, economy, null]
                  nullable: true
                permission:
                  type: string
                  enum: [only_me, all_team_members, partial_members]
                  nullable: true
                embedding_model:
                  type: string
                  nullable: true
                embedding_model_provider:
                  type: string
                  nullable: true
                retrieval_model:
                  type: object # Define retrieval model structure
                  nullable: true
                partial_member_list:
                  type: array
                  items:
                    type: string # User IDs?
                  nullable: true
      responses:
        '200':
          description: Dataset updated successfully.
        '400':
          description: Bad Request (e.g., validation failed).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Dataset not found.
    delete:
      summary: Delete Dataset
      description: Deletes a specific dataset.
      tags:
        - Datasets (Console)
      responses:
        '204':
          description: Dataset deleted successfully.
        '400':
          description: Bad Request (Dataset is in use).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Dataset not found.

  /datasets/{dataset_id}/use-check:
    get:
      summary: Check Dataset Usage
      description: Checks if a dataset is currently being used by any applications.
      tags:
        - Datasets (Console)
      parameters:
        - name: dataset_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: Usage status.
          content:
            application/json:
              schema:
                type: object
                properties:
                  is_using:
                    type: boolean
        '401':
          description: Unauthorized.
        '404':
          description: Dataset not found.

  /datasets/{dataset_id}/related-apps:
    get:
      summary: Get Related Apps
      description: Retrieves a list of applications that use a specific dataset.
      tags:
        - Datasets (Console)
      parameters:
        - name: dataset_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: List of related applications.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: Dataset not found.

  /datasets/indexing-estimate:
    post:
      summary: Estimate Indexing Cost
      description: Estimates the cost (e.g., tokens) for indexing provided documents.
      tags:
        - Datasets (Console)
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              # Define the complex request body structure here based on DocumentService.estimate_args_validate
              properties:
                info_list:
                  type: object
                process_rule:
                  type: object
                indexing_technique:
                  type: string
                  enum: [high_quality, economy, null]
              required:
                - info_list
                - process_rule
                - indexing_technique
      responses:
        '200':
          description: Indexing cost estimation.
        '400':
          description: Bad Request (Invalid arguments or estimation error).
        '401':
          description: Unauthorized.

  /datasets/api-keys:
    get:
      summary: List Dataset API Keys
      description: Retrieves all API keys specifically for dataset access.
      tags:
        - Datasets (Console)
      responses:
        '200':
          description: List of API keys.
        '401':
          description: Unauthorized.
    post:
      summary: Create Dataset API Key
      description: Creates a new API key for dataset access.
      tags:
        - Datasets (Console)
      responses:
        '200':
          description: Newly created API key.
        '400':
          description: Bad Request (e.g., maximum key limit reached).
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.

  /datasets/api-keys/{api_key_id}:
    delete:
      summary: Delete Dataset API Key
      description: Deletes a specific dataset API key.
      tags:
        - Datasets (Console)
      parameters:
        - name: api_key_id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '204':
          description: API key deleted successfully.
        '401':
          description: Unauthorized.
        '403':
          description: Forbidden.
        '404':
          description: API key not found.

components: {} # Schemas, SecuritySchemes etc. would go here in a full spec

大まかに分類すると、認証系とAPP系、Workflow系にDataset系のAPIがあるようです。

クラス図

先程のAPI定義と見比べてみると、どのAPIをどのサービスが担当しているかなどの関係性がつかめそうです。

まとめ

ここまででシステムの概要をざっくりと確認してきましたが、どうでしょうか?
個人的にはOSSのオンボーディング体験としては結構よく、Contributeする時もある程度は当たりをつけやすくなったのではないかと感じます。

AIのエージェントの理解を高めるためにシステムの仕様書を読ませるというのがあると思いますが、この辺りのビジュアライズされた仕様書を読み込ませても面白いかもしれません。

AIアプリケーション理解という観点だと、Vector Database (ベクトルデータベース)の概念は普段馴染はなく、Difyのシステム的にはこちらの理解が肝になるかなという印象でした。また時間が取れたらその辺りも深堀りしていけたらと思います。

後、今回解析したのはサーバーサイドのapiディレクトリだけで、webディレクトリのフロントエンドの解析は全く行えていません。node表現の所などはどうやって描画を実現しているのかなど、その辺りももし時間取れたら見ていきたいです。

Discussion