💡
mainのマージをトリガーとしてllms.txtを更新する

2025/04/13に公開
前回の記事でレポジトリ情報からllms.txtを作成する方法を紹介しました。

しかし、llms.txtを情報検索に使用したい場合は、鮮度の高い状態に保っておく必要があります。

今回は、mainブランチのマージ情報からllms.txtを更新する方法を紹介します。

 方法GitHub Actionsを利用します。

イケてないですが、今回は対象のレポジトリにGitHub Actionsと実行ファイルを格納して、mainブランチのマージ時にActionsを実行する仕組みにします。

 実装llms.txtを更新させたいレポジトリ内でGitHub Actionsを定義することで、mainブランチのマージ時にllms.txtを更新するようにします。

そのためには以下の2つを作成する必要があります。
GitHub Actionsの定義
実行処理を定義したファイル

 GitHub Actions実行手順は以下の通りです。
mainブランチのマージ時に差分を取得する。

(mainを直接更新した場合もプルリクエストをマージした場合も更新は１コミットに換算されます。プルリクエストでマージするブランチでコミットが複数ある場合も、マージ先の更新としては1つのコミットとして処理されます。)
llms.txtの更新処理を実行する。更新処理は以下の通り。
取得した差分をもとに修正対象のファイルコンテンツを取得する。
現在のllms.txtを取得する。
取得したファイルコンテンツと現在のllms.txtをもとに、修正対象のファイルコンテンツを更新する。
更新後のllms.txtをpushする。

まずは、上記を実行するためのGitHub Actionsを定義します。

GitHub Actionsを定義するためには、レポジトリ内に「.github/workflows」フォルダを作成し、その中にyamlファイルを作成します。

 実行トリガーの定義最初に実行トリガーを定義します。mainブランチのマージ時に実行するようにします。
on:
  push:
    branches:
      - main

 リポジトリのチェック次にリポジトリのチェックアウトを行います。
jobs:
  post-diff:
    runs-on: ubuntu-latest
    permissions:
      contents: read
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 2  # 直前のコミットとの差分を取得するため
まずジョブとしてpost-diffという名前のジョブを定義します。

このジョブはGitHubが提供するUbuntuランナー上で実行されます。

permissionsではcontents: readを設定して、レポジトリの内容を読み込むことを許可します。

(この権限に応じたトークンが自動的に発行されます。そのため、本来はトークンの設定は不要です。しかし、自動生成されたトークンでは他のレポジトリへの操作はできません。今回は他レポジトリへのpushを行うため、別途Personal Access Token（PAT）を設定する必要があります。)

stepsではジョブ内で実行される一連のステップが定義されます。

ここではactions/checkout@v4を使用してリポジトリの内容をワークスペースに取ってきます。今回はマージ前後の差分を取得したいため、fetch-depthを2に設定し最新の２件のコミットを取得しています。

 マージ前後の差分を取得する更新前後のファイル名および差分情報を取得します。
      - name: Get commit diff
        id: diff
        run: |
          echo "## Merge Diff" > diff.md
          git log -2 --pretty=format:"### %h - %s (%an)" >> diff.md
          echo "" >> diff.md
          echo '```diff' >> diff.md

 ライブラリのインストール後続でjsを実行するため、必要なライブラリをインストールします。
      - name: install libraries
        run: |
          npm install dotenv axios @langchain/core @langchain/openai

 llms.txtの更新次にllms.txtの更新を行います。ここではjavascriptのスクリプトを実行します。実行コードは後ほどご説明します。
      - name: Push new LLMs.txt
        uses: actions/github-script@v6
        env:
          GH_TOKEN: ${{ secrets.GH_PAT }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          MODEL_NAME: ${{ secrets.MODEL_NAME }}
          PUSH_REPOSITORY: ${{ secrets.PUSH_REPOSITORY }}
        with:
          script: |
            const fs = require('fs');
            
            // Use require to import from the JS file
            const { main } = require('./.github/workflows/create_and_update_repository_info.js');
            const repoOwner = context.repo.owner;
            const repoName = context.repo.repo;
            const pushRepository = process.env.PUSH_REPOSITORY;
            const baseBranch = 'main';
            const diff = fs.readFileSync('diff.md', 'utf8');
            console.log("repoOwner: ", repoOwner);
            console.log("repoName: ", repoName);
            console.log("pushRepository: ", pushRepository);
            console.log("baseBranch: ", baseBranch);
            console.log("diff: ", diff);
            
            await main(
              repoOwner,
              repoName,
              `https://github.com/${repoOwner}/${repoName}`,
              pushRepository,
              baseBranch,
              diff,
              process.env.OPENAI_API_KEY,
              process.env.MODEL_NAME
            );
まずactions/github-script@v6を呼び出して、javascriptを記述できるようにします。

envでは、GitHub Actionsのシークレットを設定します。
GH_PATはPersonal Access Tokenです。(権限は前回の記事と同様です。)
OPENAI_API_KEYはOpenAIのAPIキーです。
MODEL_NAMEは使用するモデル名です。
PUSH_REPOSITORYはpushする先のレポジトリ名です。
scriptでは、自作したjavascriptファイルと必要なパラメータを読み込んでいます。差分情報として先ほど作成したdiff.mdを読み込んでいます。

この情報を使用して、llms.txtの更新を行います。

 実行ファイル実行ファイルはtypescriptで作成したのち、tscコマンドでjsに変換します。

処理は以下の通りです。
ファイル差分から修正対象のファイルコンテンツを取得する。
差分・修正対象のファイルコンテンツ・現在のllms.txtをもとに、llms.txtを更新する。
更新後のllms.txtをpushする。
処理は基本的には前回の記事と同じです。(コードは最後に記載します。)
コードを作成したら、実行ファイルをtscでjsに変換します。
tsc create_and_update_repository_info.ts
生成されたjsを.github/workflowsに配置します。
jsとyamlをpushすると、GitHub Actionsが実行されます。(その後mainブランチへのマージ/プッシュのたびにActionsが実行されます。）
GitHub Actionsが成功するとPUSH_REPOSITORYで指定したレポジトリにllms.txtがpushされます。

 まとめ今回は、mainブランチのマージ時にllms.txtを更新する方法を紹介しました。

GitHub Actionsを利用することで、llms.txtの更新を自動化することができます。

しかし、このままだとレポジトリごとにGitHub Actionsを定義する必要があるため、もう少しいい方法がないかは調査していきたいと思います。ご存じの方がいらっしゃいましたら情報をいただけると幸いです。
追記: テンプレートレポジトリを使うとレポジトリ作成時に必要なファイルをクローンしてくれるようです。こちらのスクラップで簡単な説明をしています。

 コード全文yaml
name: Post Merge Diff to Issue

on:
  push:
    branches:
      - main

jobs:
  post-diff:
    runs-on: ubuntu-latest
    permissions:
      contents: read
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 2  # 直前のコミットとの差分を取得するため

      - name: Get commit diff
        id: diff
        run: |
          echo "## Merge Diff" > diff.md
          git log -2 --pretty=format:"### %h - %s (%an)" >> diff.md
          echo "" >> diff.md
          echo '```diff' >> diff.md
          git diff HEAD^ HEAD >> diff.md
          echo '```' >> diff.md

      - name: install libraries
        run: |
          npm install dotenv axios @langchain/core @langchain/openai

      - name: Push new LLMs.txt
        uses: actions/github-script@v6
        env:
          GH_TOKEN: ${{ secrets.GH_PAT }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          MODEL_NAME: ${{ secrets.MODEL_NAME }}
          PUSH_REPOSITORY: ${{ secrets.PUSH_REPOSITORY }}
        with:
          script: |
            const fs = require('fs');
            
            // Use require to import from the JS file
            const { main } = require('./.github/workflows/create_and_update_repository_info.js');
            const repoOwner = context.repo.owner;
            const repoName = context.repo.repo;
            const pushRepository = process.env.PUSH_REPOSITORY;
            const baseBranch = 'main';
            const diff = fs.readFileSync('diff.md', 'utf8');
            console.log("repoOwner: ", repoOwner);
            console.log("repoName: ", repoName);
            console.log("pushRepository: ", pushRepository);
            console.log("baseBranch: ", baseBranch);
            console.log("diff: ", diff);
            
            await main(
              repoOwner,
              repoName,
              `https://github.com/${repoOwner}/${repoName}`,
              pushRepository,
              baseBranch,
              diff,
              process.env.OPENAI_API_KEY,
              process.env.MODEL_NAME
            );
typescript
import "dotenv/config";
import axios from 'axios';

import { ChatOpenAI } from '@langchain/openai';
import { BaseChatModel } from '@langchain/core/language_models/chat_models';
import { PromptTemplate } from '@langchain/core/prompts';


const PROMPT_TEMPLATE = `現在のllms.txt、ファイルの差分情報および修正後のファイル内容をもとに、新しいllms.txtを出力してください。

現在のllms.txt:
現在のllms.txtは以下の通りです。現在のllms.txtをベースにして、プルリクエストの情報に基づき、llms.txtを追加更新してください。
\`\`\`
{current_llms_txt}
\`\`\`

ファイルの差分情報:
\`\`\`
{diff_contents}
\`\`\`

レポジトリ情報:
\`\`\`
レポジトリ名: {repository_name}
レポジトリURL: {repository_url}
\`\`\`

修正後のファイル内容:
\`\`\`\`
{file_contents}
\`\`\`\`


llms.txtの出力形式:
以下のように<output>タグ内に必要な情報を記載してください。
現在のllms.txtから差分情報をもとに、現在のレポジトリ構造に情報を修正してください。(修正点は含めずに、現在のレポジトリの情報に焦点を当ててください。)
現在のllms.txtの情報がない場合は新規で作成してください。
出力は修正後のllms.txtの内容全文を出力してください。
<output>
# レポジトリ名[レポジトリURL]

> プロジェクト概要説明
 
プロジェクト詳細説明(500文字以内で記載)

## ファイル一覧
- ファイル名1[ファイルパス1]: ファイル1の概要説明(300文字以内で記載)
- ファイル名2[ファイルパス2]: ファイル2の概要説明(300文字以内で記載)
...
</output>

それではタスクを開始してください。
`;


async function getFileContent(organization: string, repository: string, path: string) {
  // GITHUB_TOKEN を環境変数から取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GITHUB_TOKEN 環境変数が設定されていません。');
  }

  try {
    const response = await axios.get(`https://api.github.com/repos/${organization}/${repository}/contents/${path}`, {
      headers: {
        'Authorization': `token ${token}`,
        'Accept': 'application/vnd.github+json'
      }
    });
    return response.data;
  } catch (error) {
    // console.error('ファイルの内容の取得に失敗しました:', error);
    return null;
  }
}

async function encodingFileContent(content: any) {
  try {
    return Buffer.from(content.content, content.encoding).toString('utf-8');
  }
  catch (error) {
    console.error('ファイルの内容のエンコードに失敗しました:', error);
    return "";
  }
}

async function modifyLLMsTxt(llm: BaseChatModel, repository_name: string, repository_url: string, file_contents: string, current_llms_txt: string, diff: string): Promise<string> {
  const prompt = new PromptTemplate({
    template: PROMPT_TEMPLATE,
    inputVariables: ['repository_name', 'repository_url', 'file_contents', 'current_llms_txt', 'diff_contents'],
  });
  const chain = prompt.pipe(llm);
  const result = await chain.invoke({
    repository_name: repository_name,
    repository_url: repository_url,
    file_contents: file_contents,
    current_llms_txt: current_llms_txt,
    diff_contents: diff,
  });
  const result_str = result.content as string;
  const result_match = result_str.match(/<output>\n*([\s\S]*?)\n*<\/output>/);
  if (result_match) {
    return result_match[1];
  } else {
    return result_str;
  }
}

async function getLatestCommitSha(organization: string, repository: string, branch: string): Promise<string> {
  // 環境変数からトークンを取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GH_TOKEN 環境変数が設定されていません。');
  }
  try {
    const response = await axios.get(`https://api.github.com/repos/${organization}/${repository}/git/ref/heads/${branch}`, {
      headers: {
        'Authorization': `token ${token}`,
        'Accept': 'application/vnd.github+json'
      }
    });
    return response.data.object.sha;
  } catch (error: any) {
    console.error('An error occurred:', error.response?.data || error.message);
    throw error;
  }
}

async function createBranch(organization: string, repository: string, newBranch: string, sha: string): Promise<void> {
  // 環境変数からトークンを取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GH_TOKEN 環境変数が設定されていません。');
  }
  try {
    await axios.post(`https://api.github.com/repos/${organization}/${repository}/git/refs`, {
      ref: `refs/heads/${newBranch}`,
      sha,
    }, {
      headers: {
        'Authorization': `token ${token}`,
        'Accept': 'application/vnd.github+json'
      }
    });
  } catch (error: any) {
    console.error('An error occurred:', error.response?.data || error.message);
    throw error;
  }
}

async function updateFile(organization: string, repository: string, branch: string, filePath: string, content: string, message: string): Promise<void> {
  // 環境変数からトークンを取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GH_TOKEN 環境変数が設定されていません。');
  }
  try {
    const base64Content = Buffer.from(content).toString('base64');
    
    // 既存ファイルの情報を取得
    let sha = '';
    try {
      const response = await axios.get(`https://api.github.com/repos/${organization}/${repository}/contents/${filePath}?ref=${branch}`, {
        headers: {
          'Authorization': `token ${token}`,
          'Accept': 'application/vnd.github+json'
        }
      });
      sha = response.data.sha;
    } catch (error: any) {
      // ファイルが存在しない場合は新規作成するのでエラーを無視
      if (error.response?.status !== 404) {
        throw error;
      }
    }

    // ファイルを作成または更新
    const payload: any = {
      message,
      content: base64Content,
      branch
    };

    // 既存ファイルの場合はshaを追加
    if (sha) {
      payload.sha = sha;
    }

    await axios.put(`https://api.github.com/repos/${organization}/${repository}/contents/${filePath}`, 
      payload, 
      {
        headers: {
          'Authorization': `token ${token}`,
          'Accept': 'application/vnd.github+json'
        }
      }
    );
  } catch (error: any) {
    console.error('ファイルの更新に失敗しました:', error.response?.data || error.message);
    throw error;
  }
}

async function createPullRequest(organization: string, repository: string, title: string, body: string, head: string, base: string): Promise<void> {
  // 環境変数からトークンを取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GH_TOKEN 環境変数が設定されていません。');
  }
  try {
    await axios.post(`https://api.github.com/repos/${organization}/${repository}/pulls`, {
      title,
      body,
      head,
      base,
    }, {
      headers: {
        'Authorization': `token ${token}`,
        'Accept': 'application/vnd.github+json'
      }
    });
  } catch (error: any) {
    console.error('An error occurred:', error.response?.data || error.message);
    throw error;
  }
}

async function pushLLMsTxt(organization: string, repository: string, repository_url: string, push_repository: string, base_branch: string, diff: string, llm: BaseChatModel) {
  // GITHUB_TOKEN を環境変数から取得
  const token = process.env.GH_TOKEN;
  if (!token) {
    throw new Error('GH_TOKEN 環境変数が設定されていません。');
  }

  try{ 
    const modify_file_match = diff.matchAll(/diff --git a\/(.*) b\/(.*)/g);
    let file_contents = "";
    for (const match of Array.from(modify_file_match)) {
      const new_file = match[2];
      const file_content = await getFileContent(organization, repository, new_file);
      if (file_content) {
        const file_content_str = await encodingFileContent(file_content);
        file_contents += (
          `ファイル名: ${file_content.name}\n`
          + `ファイルパス: ${file_content.path}\n`
          + `ファイル内容: \n\`\`\`\n${file_content_str}\n\`\`\`\n`
          + `\n`
        );
      }
      console.log(file_contents);
      console.log("--------------------------------");
    }

    // llms.txtを取得
    const llms_txt_content = await getFileContent(organization, push_repository, `${repository}/llms.txt`);
    const llms_txt_str = await encodingFileContent(llms_txt_content);
    console.log(llms_txt_str);
    console.log("--------------------------------");

    // llms.txtを更新
    const updated_llms_txt = await modifyLLMsTxt(llm, repository, repository_url, file_contents, llms_txt_str, diff);
    console.log(updated_llms_txt);
    console.log("--------------------------------");

    // llms.txtをプッシュ
    // ベースブランチの最新コミットSHAを取得
    const baseSha = await getLatestCommitSha(organization, push_repository, base_branch);
    console.log(`baseSha: ${baseSha}`);
    // 新しいブランチを作成
    const now = new Date();
    const formattedDate = `${now.getFullYear()}${(now.getMonth() + 1).toString().padStart(2, '0')}${now.getDate().toString().padStart(2, '0')}-${now.getHours().toString().padStart(2, '0')}${now.getMinutes().toString().padStart(2, '0')}${now.getSeconds().toString().padStart(2, '0')}`;
    const newBranch = `llms-txt-${formattedDate}`;
    await createBranch(organization, push_repository, newBranch, baseSha);
    console.log(`newBranch: ${newBranch}`);
    // llms.txtをプッシュ
    const filePath = `${repository}/llms.txt`;
    const commitMessage = `Update llms.txt`;
    await updateFile(organization, push_repository, newBranch, filePath, updated_llms_txt, commitMessage);
    console.log(`filePath: ${filePath}`);
    // プルリクエストを作成
    const prTitle = `Update llms.txt`;
    const prBody = `Update llms.txt`;
    await createPullRequest(organization, push_repository, prTitle, prBody, newBranch, base_branch);
    console.log(`PR_TITLE: ${prTitle}`);
    console.log('Pull request created successfully.');
  } catch (error) {
    console.error('ファイルの内容の取得に失敗しました:', error);
  }
}

export async function main(organizationName: string, repositoryName: string, repositoryUrl: string, pushRepositoryName: string, baseBranch: string, diff: string, apiKey: string, modelName: string) {
    const llm = new ChatOpenAI({
        apiKey: apiKey,
        model: modelName,
    });

    await pushLLMsTxt(organizationName, repositoryName, repositoryUrl, pushRepositoryName, baseBranch, diff, llm);
}
方法

実装

GitHub Actions

実行トリガーの定義

リポジトリのチェック

マージ前後の差分を取得する

ライブラリのインストール

llms.txtの更新

実行ファイル

まとめ

コード全文

Discussion