🎃

issueが立ったらChatGPTに自動でpull requestを作らせる(PoC)

ryooo

2023/09/10に公開

こんにちは、Happy Elements 株式会社でエンジニアをしておりますryoooです。

はじめに

ChatGPTとFunction callingを使ってみて、「いずれはissueを立てたら自動でPRが出てくる世界がやってくるんだろうな、きてほしい、はよこい」と思ったので、現時点でどれくらいできるのか作ってみました。

簡単にいうと、issueに書いた内容に応じてプログラマー役のgptとレビュワー役のgptでci上でやり取りをしながらpull requestを固める試みです。

結論、簡単なissueに対して正しいPRを作ってくれるところまでは実際に動作しました。
※ ソースコードや生成されたPRについても公開いたします。

Function callingとは

簡単にいうとGPT-4に対してあら予め「あなたはこういう関数を利用できるから、利用したい場合は引数をレスポンスしてね」と伝えておくことで、GPT-4とバックエンドサーバーで自動で何度かやり取りを行い、ユーザーの複雑な要求を解決する機能です。
本家OpenAIだけでなく、Azure OpenAI Serviceでも（記事公開時点previewで）利用可能です。

(参考) Microsoftのドキュメント
- https://learn.microsoft.com/ja-jp/azure/ai-services/openai/how-to/function-calling

動作の流れ

1. issueに「llm-dev」ラベルを付与するとGitHub Actionsが起動

2. Programmer役のAgentがFunction callingを駆使して修正を行う

3. Reviewer役のAgentがFunction callingを駆使して確認を行う

4. LGTMとなればPRを作る

5. 実際に作られたPR

6. 実装されたテストコード

require 'rails_helper'

RSpec.describe AzureOpenAi::Functions::ReadFile, type: :model do
  describe 'execute_and_generate_message' do
    context 'when file exists' do
      let(:filepath) { 'spec/dummy_files/dummy.txt' }
      before do
        File.open(filepath, 'w') { |file| file.write('Hello, World!') }
      end

      it 'returns the file contents' do
        result = AzureOpenAi::Functions::ReadFile.new.execute_and_generate_message({ filepath: filepath })
        expect(result[:file_contents]).to eq('Hello, World!')
      end

      after do
        File.delete(filepath)
      end
    end

    context 'when file does not exist' do
      let(:filepath) { 'spec/dummy_files/non_existent.txt' }

      it 'returns an error' do
        result = AzureOpenAi::Functions::ReadFile.new.execute_and_generate_message({ filepath: filepath })
        expect(result[:error]).to eq('File not found.')
      end
    end
  end
end

利用しているもの

Ruby 3.2.1
- PythonでなくRubyを使っているのは私にとってRubyが最も慣れているためであり、もちろんPythonでも同じことは可能です。
- 修正対象のプロジェクトはrubyである必要はありません。
Rails 7.0.7
- サーバーとしては使っておらず、PRを作る機能をrakeタスクとして実装し、PRを作る対象のプロジェクトとしても兼用しています。
- Railsのような一般的な知見があるプロジェクトを修正対象としたほうがLLMが混乱せずPRを作れるかと考えました。
Azure OpenAI Service
- GPT-4-32k
- Function calling(2023-07-01-previewで利用可能)
Google custom search API
- LLMが修正方法を判断できなかったときやエラーメッセージに遭遇したときに、webから検索して参考にできるようにしています。

処理の流れ

PRをつくるrakeタスク本体

このrakeタスクは、issueにラベルをつけたときにGitHub Actionsによって実行されます。

ここではリーダーからの指示を受け取り、２人のAgentにやりとりをさせてLGTMを待ち、LGTM後にPRを作る全体の制御を行っています。

namespace :llm do
  task dev: :environment do
　　　　　　　　# ISSUE経由だけでなく、直接rakeタスクを実行する際にプロンプトを渡しやすくしておく
    prompt = ENV['PROMPT']
    if prompt.blank?
      raise 'prompt is required' if ENV['ISSUE_TITLE'].blank?
      prompt = "title: #{ENV['ISSUE_TITLE']}\ndescription: #{ENV['ISSUE_DESCRIPTION']}"
    end
    puts("Engineer leader: #{prompt}".light_red)

    # リーダーからの指示を二人に設定
    programmer = AzureOpenAi::Agents::Programmer.new(prompt, :cyan)
    reviewer = AzureOpenAi::Agents::Reviewer.new(prompt, :green)

    reviewer_comment = nil
    i = 0
    while (i += 1) < 10 # 2人のやりとりは最大10回まで
      # 互いに申し送り事項を伝え合いながら、職責を実行
      programmer_comment = programmer.work(reviewer_comment:)
      reviewer_comment = reviewer.work(programmer_comment:)

      # レビュワーからLGTMが出たら終了
      break if reviewer.lgtm?
    end

    # レビュワーからLGTMが出ていたらPRを作る
    if reviewer.lgtm?
      programmer.make_pr!
    end
  end
end

Aagents::Programmerの実装

ここで重要だったのは、Programmerとしての役割設定をしっかり行うことでした。
実装した当初のProgrammerは出鱈目だったり斜め上な対応を行いますが、「あれは駄目、こういうときはこうする」などとシステムプロンプトをチューンすることで、どんどん精度が上がりました。
ここのプロンプトにはおそらくもっとtokenを割くべきなんでしょう。

システムプロンプトの日本語訳は以下です。
- あなたは優れたRubyのプログラマーです。
- エンジニアリーダーからの指示に基づいて、このリポジトリに適切な修正を加えてください。
- 修正が完了したら、Railsランナーやシェルコマンドを使って、修正が意図した通りに完了していることを確認してください。
- プログラムのポリシーはあなた自身で決定する責任があります。テストコードを実装する際には、対象クラスの実装から仕様を想像し、テストケースを具体的に実装してください。
- そして、rspecの結果を確認し、一つずつテストが通るように修正してください。
- 基本的には、実装側を修正しないでください。もし実装側にバグが含まれている場合は、実装側を修正してください。
- テストに関連するすべてのダミーファイルは、specフォルダの下に作成してください。
- ファイルを修正する際には、まずread_fileを行い、適切にインデントを行ってください。
- 何かわからないことがあれば、google_searchやopen_urlを使ってヒントを探してください。
- あなたは唯一のプログラマーです。問題がある間はレビュアーに問題をなげず、問題を解決してからレビュアーに返答してください。
- 修正が完了したら、日本語でレビュアーにあなたの懸念を伝えてください。
- エンジニアリーダーからの要求は以下の通りです。

module AzureOpenAi
  module Agents
    class Programmer < Base
      def work(reviewer_comment: nil)
        puts("---------------------------------".send(@color))
        puts("#{self.actor_name}: start to work".send(@color))

        # Programmerとしての役割設定をシステムプロンプトとして指定
        # 　　（LlmMessageContainerは後述します）
        message_container = LlmMessageContainer.new
        message_container.add_system_message("You are an excellent Ruby programmer. \n" + \
          "Please make appropriate modifications to this repository using functions based on instructions from engineer leader.\n" + \
          "Once the modification is complete, " + \
          "use the Rails runner or shell command to confirm that the modification is as intended and complete.\n" + \
          "You are responsible for determining program policies themselves.\n" + \
          "When implementing the test code, imagine the specification from the implementation of the target class and concretely implement the test case.\n" + \
          "And check the rspec results and modify them one by one to pass the tests.\n" + \
          "Basically, do not modify the implementation side.\n" + \
          "If implementation side include any bug, then modify the implementation side.\n" + \
          "All dummy files related to test should be created under the spec folder.\n" + \
          "When modifying a file, read_file first and indent properly.\n" + \
          "If you don't understand something, use google_search or open_url to find hints.\n" + \
          "You are the only programmer. Don't raise the issue with the reviewer while there is a problem;" + \
          "solve the problem before responding to the reviewer.\n" + \
          "After making the corrections, please inform the reviewer of your concerns in japanese.\n" + \
          "The request from the engineer leader is as follows.\n\n" + @leader_comment)

        # Reviewer役からのメッセージがあれば設定
        if reviewer_comment.present?
          message_container.add_system_message("The request from the reviewer is as follows.\n\n" + reviewer_comment)
        end

        # 現状の差分もシステムプロンプトに入れておく
        if diff = self.get_current_diff
          message_container.add_system_message("The current modifications are as follows.\n\n" + diff)
        end

        # LLMを実行している箇所
        azure_open_ai = AzureOpenAi::Client.new
        io, _, _ = azure_open_ai.chat_with_function_calling_loop(
          messages: message_container,
          functions: [ # Programmer用のFunction calling設定
            AzureOpenAi::Functions::GetFilesList.new,
            AzureOpenAi::Functions::ReadFile.new,
            AzureOpenAi::Functions::AppendTextToFile.new,
            AzureOpenAi::Functions::ModifyTextOfFile.new,
            AzureOpenAi::Functions::MakeNewFile.new,

            AzureOpenAi::Functions::ExecRailsRunner.new,
            AzureOpenAi::Functions::ExecShellCommand.new,

            AzureOpenAi::Functions::GoogleSearch.new,
            AzureOpenAi::Functions::OpenUrl.new(@leader_comment),
          ],
          color: @color,
          actor_name: self.actor_name,
        )

        comment = io.rewind && io.read
        puts("#{self.actor_name}: #{comment}".send(@color))
        comment
      end

      def make_pr!
        # PR用の情報を生成するにもFunction callingを使う
        generate_pr_params_function = AzureOpenAi::Functions::GeneratePullRequestParams.new
        azure_open_ai = AzureOpenAi::Client.new
        io, _, _ = azure_open_ai.chat_with_function_calling_loop(
          messages: [
            {
              role: :system,
              # 修正差分に対して適切なPRタイトルとデスクリをFunctionに送るように指示
              content: "You are an excellent Ruby programmer. \n" + \
                "Please call generate appropriate pull request parameter function for the following diff.\n\n #{self.get_current_diff}",
            }
          ],
          functions: [generate_pr_params_function],
          color: @color,
          actor_name: self.actor_name,
        )

        # generate_pr_params_functionオブジェクトに設定された（であろう）情報を利用
        issue_number = ENV['ISSUE_NUMBER'].blank? ? '' : " ##{ENV['ISSUE_NUMBER']}"
        exec_sh("git checkout -b #{generate_pr_params_function.branch_name}")
        exec_sh("git add .")
        exec_sh("git commit -m '#{generate_pr_params_function.title}'")
        exec_sh("git push --set-upstream origin #{generate_pr_params_function.branch_name}")
        exec_sh("gh pr create --base main --head #{generate_pr_params_function.branch_name} " + \
          "--title '#{generate_pr_params_function.title}#{issue_number}' --body '#{generate_pr_params_function.description}'")
      end
    end
  end
end

Aagents::Reviewerの実装

ここで重要だったのは、Programmerと役割をしっかり分けることでした。
Programmerのみで目的を達成しようとしていると、うまく問題解決できず泥沼にハマってしまうようなことがありましたが、異なる役割のエージェントを入れることで、うまく問題解決に向かえるような挙動がみられました。
エンジニアのパラメーターだけだとプラトーにハマってしまうところを異なる観点から勾配降下することで抜け出せるような感じでまさに現実世界っぽい。

今回の実装ではエージェントは２種類としていますが、複雑な問題を解くにはもっと多彩な役割をもつエージェントを用意する必要があるのかもしれないと思いました。Rubyプログラマーだけでなく、テスト設計のスペシャリスト、トラブル発生時に作業の先頭から丁寧に見直して問題を特定するタイプのエージェント、ci職人など。

LLMは役割を明確に細分化することで精度が上がる傾向があるため、人間を採用する際の常識は捨てて、細かく特化した多種のエージェントを開発したほうが全体の精度が上がると考えています。

レビュワーのシステムプロンプトの日本語訳は以下の通りです。
- あなたは優れたRubyプログラムのレビュアーです。
- エンジニアリーダーからの要求に対するプログラマーによる修正をレビューし、徹底的にチェックし、さらなる注意が必要なポイントを特定し、それらを日本語でプログラマーに指摘します。
- 何かわからないことがあれば、google_searchやopen_urlを使ってヒントを探してください。
- レビューする際には、特に以下の点に注意してください:
  - gemを追加するときは、Gemfileも修正してbundle installが通ることを確認します。
  - 修正されたコードは、自然なデザインで、適切で理解しやすい変数名を利用した読みやすいコードでなければなりません。
  - RubyスクリプトにはRspecテストが実装されています。
  - ciを修正するときは、ciによって実行されるプログラムが正しく動作すること。
  - Rubyファイルを修正するときは、rspecが正しく動作すること。
- すべてのチェックが完了し、問題が見つからなければ、必ずreport_lgtm関数を実行します。
- エンジニアリーダーからの要求は以下の通りです。

module AzureOpenAi
  module Agents
    class Reviewer < Base
      def initialize(leader_comment, color)
        super(leader_comment, color)

        # LLMがLGTMと判断したときに呼ばれるファンクションオブジェクト
        @record_lgtm_function = AzureOpenAi::Functions::RecordLgtm.new
      end

      def lgtm?
        # RecordLgtmファンクションが実行されていればLGTMとして扱うようになっています
        @record_lgtm_function.lgtm?
      end

      def work(programmer_comment: nil)
        puts("---------------------------------".send(@color))
        puts("#{self.actor_name}: start to work".send(@color))

        # Reviewerとしての役割設定をシステムプロンプトとして指定
        message_container = LlmMessageContainer.new
        message_container.add_system_message("You are an excellent Ruby program reviewer. \n" + \
          "We review and thoroughly check the modifications made by programmers in response to requests from engineer leader, " + \
          "and identify any points that require additional attention and point them out to programmers in japanese.\n" + \
          "If you don't understand something, use google_search or open_url to find hints.\n" + \
          "When reviewing, please pay particular attention to the following points:\n" + \
          "- When adding a gem, also modify the Gemfile to ensure that bundle install passes.\n" + \
          "- The revised code should have a natural design, with readable code that utilizes appropriate and understandable variable names.\n" + \
          "- Rspec tests are implemented for Ruby scripts.\n" + \
          "- When modifying ci, the program executed by ci should work properly.\n" + \
          "- rspec works properly when modifying ruby　files.\n" + \
          "Once all checks have been completed and there are no issues found, execute the report_lgtm function surely.\n" + \
          "The request from the engineer leader is as follows.\n\n" + @leader_comment)

        # Programmer役からのメッセージがあれば設定
        if programmer_comment.present?
          message_container.add_system_message("The request from the programmer is as follows.\n\n" + programmer_comment)
        end

        # 現状の差分もシステムプロンプトに入れておく
        diff = self.get_current_diff
        message_container.add_system_message("The modifications made by the programmer are as follows.\n\n" + diff)

        # LLMを実行している箇所
        azure_open_ai = AzureOpenAi::Client.new
        io, _, _ = azure_open_ai.chat_with_function_calling_loop(
          messages: message_container,
          functions: [ # Reviewer用のFunction calling設定（修正は行わない）
            @record_lgtm_function,
            AzureOpenAi::Functions::GetFilesList.new,
            AzureOpenAi::Functions::ReadFile.new,

            AzureOpenAi::Functions::ExecRailsRunner.new,
            AzureOpenAi::Functions::ExecShellCommand.new,

            AzureOpenAi::Functions::GoogleSearch.new,
            AzureOpenAi::Functions::OpenUrl.new(@leader_comment),
          ],
          color: @color,
          actor_name: self.actor_name,
        )

        comment = io.rewind && io.read
        puts("#{self.class.name.split('::').last}: #{comment}".send(@color))
        comment
      end
    end
  end
end

chat_with_function_calling_loopの実装

Function calling用の実装として、LLMからfunction_callのレスポンスが来たらFunctionを実行して得られた結果をLLMに投げ返す必要があります。

このメソッドではそのあたりの処理をラップして簡単に呼び出せるようにしています。

    def chat_with_function_calling_loop(**args)
      color = args.delete(:color) || (raise 'color is required.')
      actor_name = args.delete(:actor_name) || (raise 'actor_name is required.')

      chat_message_io = StringIO.new
      function_histories = []

      if args[:messages].is_a?(LlmMessageContainer)
        message_container = args[:messages]
      else
        message_container = LlmMessageContainer.new
        message_container.add_raw_messages(args[:messages])
      end

      i = 0
      while (i += 1) < 30　# Function callingとの往復は最大30回まで
        # LLMを実際に呼んでいるところはこちら（streamingは使わず同期実行）
        ret = self.chat(parameters: args.merge({
          # ２５，０００トークンに制限された履歴メッセージ（後ほど説明します）
          messages: message_container.to_capped_messages,
          # Function定義を取得
          functions: args[:functions].map { |f| f.class.definition },
        }))

        # LLMからfunction_callが戻ってきた場合
        if ret.dig("choices", 0, "finish_reason") == 'function_call'
          message = ret.dig("choices", 0, "message")
          function = args[:functions].detect { |f| f.function_name == message['function_call']['name'] }

          # Azureから戻ってくるfunction_callオブジェクトをセットするのですが、そのままだとcontentがなくてエラーになります。
          # これはAzure側のバグ？
          message_container.add_raw_message(message.merge({ content: nil }))

          function_args = (JSON.parse(message.dig('function_call','arguments')) || {}).with_indifferent_access
          puts("#{actor_name}: #{function.class.name.send(color)}")
          puts(function_args)

          # Functionを実行して、実行結果を次のLLM呼び出しのメッセージに設定
          function_result = function.execute_and_generate_message(function_args)
          # puts(function_result) if Rails.env.development?
          message_container.add_raw_message({
            role: "function",
            name: function.function_name,
            content: JSON.dump(function_result),
          })

          function_histories << {
            function_calling: message,
            result: function_result,
          }
        else
          break
        end
      end

      # LLMから最終的に得られたメッセージをStringIOに設定
      if content = ret.dig("choices", 0, "message", "content")
        chat_message_io.write(content)
      else
        chat_message_io.write(JSON.dump(ret))
      end
      [chat_message_io, function_histories, message_container]
    end

LlmMessageContainerの実装

Function callingのやり取りが増えていくと、contentとしてJSON文字列を設定する関係で、messagesのトークンがどんどん増えてしまいます。

LLMは32kトークンまで大丈夫なのですが、単純に上から順番に消して行ってしまうと重要なシステムプロンプトが消えてしまいLLMが暴走しかねません。

このクラスを使うことで、システムプロンプトが消えないようにmessagesのトークン数をしきい値以下に抑えるようにしています。

class LlmMessageContainer
  attr_reader :messages
  def initialize
    @messages = []
    @metas = []

    # token算出を行うためのエンコーダー
    @token_encoder = Tiktoken.get_encoding("cl100k_base")

    self.add_default_system_message!
  end

  def add_system_message(content)
    @metas << { index: @metas.size, token: @token_encoder.encode(content.to_s).size, role: :system }
    @messages << { role: :system, content: content }
  end

  def add_user_message(content)
    @metas << { index: @metas.size, token: @token_encoder.encode(content.to_s).size, role: :user }
    @messages << { role: :user, content: content }
  end

  def add_raw_message(message)
    message = message.with_indifferent_access
    @metas << { index: @metas.size, token: @token_encoder.encode(message[:content].to_s).size, role: message[:role] }
    @messages << message
  end

  def total_token
    @metas.map { |meta| meta[:token] }.sum
  end

  def add_raw_messages(messages)
    messages.each do |message|
      self.add_raw_message(message)
    end
  end

  def to_capped_messages(token_limit: 28_000)
    if self.total_token > token_limit
      # システムメッセージは必ず残す
      system_metas, not_system_metas = @metas.partition { |message| message[:role] == :system }
      current_token = system_metas.map { |meta| meta[:token] }.sum
      filtered_indexes = system_metas.map { |meta| meta[:index] }

      # ユーザーメッセージやfunction_callの結果は、新しいものを優先的に残す
      filtered_metas = []
      not_system_metas.reverse.each do |meta|
        break if token_limit < current_token + meta[:token]

        current_token += meta[:token]
        filtered_indexes << meta[:index]
      end

      # システムメッセージが途中に挟み込まれている場合も、並び順を維持すること
      system_and_filtered_message = @messages.select.with_index { |message, i| filtered_indexes.include?(i) }
      system_and_filtered_message
    else
      @messages
    end
  end

  def add_default_system_message!
    self.add_system_message("Current time is #{Time.current.to_s(:jp_mdw_hm)}")
  end
end

Functionの実装

Function開発が億劫になるとツラいので、Function追加はできるだけ簡単に行えるようにしています。
以下のようにFunction定義とFunction実行部分のみ実装だけ行い、前述のようにLLMにFunctionインスタンスを渡すようにしています。

また、AppendTextToFileとModifyTextOfFileとMakeNewFileは1つのファンクションにすることもできたのですが、あえて分けておくことでLLMが簡単に使えるようにしています。
（実行時のパラメーターを複雑にしないほうがLLMが間違えない傾向があるので、そのための工夫です。）

module AzureOpenAi
  module Functions
    class ExecShellCommand < Base
      # Functiont定義
      def self.definition
        return @definition if @definition.present?

        @definition = {
          name: self.function_name,
          description: "Execute shell command for test it.",
          parameters: {
            type: :object,
            properties: {
              script: {
                type: :string, # string, number以外にarrayが可能な模様
                description: "Program to run with shell.",
              },
            },
            required: [:script],
          },
        }
        @definition
      end

      # Functiont実行
      def execute_and_generate_message(args)
        stdout, stderr, status = Open3.capture3(args[:script])
        {stdout:, stderr:, exit_status: status.exitstatus}
      end
    end
  end
end

今後やりたいこと

もっと大量のソースをさばけるように、エージェントに短期記憶をもたせる
shell execしているところはコマンド内容をホワイトリスト化
Rubocopによるフォーマットで自動修正できないところも修正させる
チームのコーディングルールを前提として食わせる仕組み
リポジトリのADRを検索する仕組み
GitHubから似たような実装を検索する仕組み
Pull requestに対してコメントを書いたときに追加でエージェントに修正させる
対応方針がわからない場合にPull request上で人間に向けて質問する仕組み
Gem化

おわりに

最後までみていただきありがとうございます。
Function callingに感動している一人のENGとして、Function callingの可能性が一人でも多くの方に伝わればうれしく思います。
※ Function callingに興味をもっていただけましたら以下の記事もご参照いただけると嬉しいです。

今回は概念実証（PoC）ということで一晩の成果を取り急ぎ共有しましたが、反応が多くいただけるようでしたらもっと簡単に他のプロジェクトでも試していただけるように整備したりしたり精度を上げたりしたいです。
今回ご紹介したコードは以下のリポジトリ内にありますので、参考になりましたらご参照ください。

よろしければ、ハート・フォロー・シェアをいただけますと喜びます :)
失礼いたします。

Happy ElementsPublication

京都でスマートフォン向けゲームを開発・運営している Happy Elements カカリアスタジオの Publication です

Discussion

reon777

cloneして動かしてみましたが、すごいですねこれ！
レビュアー役のchatGPTとのやり取り見るの面白い〜

実用化に向けてネックがあるとすると、Function Callingを駆使してもやっぱり「どのファイルを修正するか」の特定が難しいことですかね。今回の例みたいにテストコード書くだけなら問題なさそうですが、通常の開発だとコードの修正が主だと思うので。
ただ、それはチケットに対象ファイルを記載することで解決できそうですし、これもうかなり本格的に導入できるレベルなのでは、と感じました！

すごく参考になる記事をありがとうございました！

ryooo

neon777さん、コメントをありがとうございます！

現在、これをベースにクライアントで動かすテストコードジェネレーターのRuby gemを開発しています。
もう少ししたらまたzennで共有しますので、よろしければそちらも御覧ください🙏

そちらの方ではテストコード生成に特化したシステムプロンプトになっていて、1つのRubyファイルに対して1つのテストコードを生成することに焦点を絞ることで生成クオリティをあげています。

reon777

おお、良いですね！
仕事でrails使ってるので多分利用させて頂くと思います！
楽しみにしてます〜

ログインするとコメントできます