🧪

最適化したプロンプトを使って Azure OpenAI Service でドキュメントからテストコードを生成する #1

2023/04/12に公開

PromptGenerator メソッドによって最適化されたプロンプトを使って Azure OpenAI Service で OpenAPI と PlantUML からテストコードを生成する

前回、Azure OpenAI Service を使って OpenAPI と PlantUML からテストコードを生成する取り組みについて説明しました。
ここで出来ていなかったことの一つとして、プロンプトのチューニングがあります。
今回、ChatGPT にプロンプトを生成させる PromptGenerator というメソッドを用いて最適化されたプロンプトを生成し、そのプロンプトを用いて前回同様のアプローチでテストコードを生成して、その内容を評価したいと思います。

ゴール

PromptGenerator メソッドを用いて、前回のプロンプトに代わる最適化されたプロンプトを生成する。
前項のプロンプトを使ってドキュメントからテストコードを生成し、その内容を評価する。

免責

本記事の内容は、執筆時点のものです。LLM の変化やゆらぎもあるため、再現性は保証されません。
本記事の内容は検証レベルのものです。完全な手法に関する情報を提供するものではありません。
本記事で使用する PlantUML は、細部まで作り込んでいるわけではありません。細かい部分で間違いがある可能性があります。

PromptGenerator について

オリジナルは Bret Littlefield 氏によるこちらの投稿です。
日本語による解説等については、以下の記事を参照してください。
- 【プロンプト勉強】サイコーのプロンプトを導くための方法
- ChatGPTのプロンプト作りの悩みを解消する最強メソッド！

プロンプトによるプロンプトの生成

ChatGPT (GPT-4) で以下のやりとりを行い、プロンプトを生成しました。
プロンプトの生成回数は 15 回で、その中には前回挙げた改善点に関する内容も含まれています。

初回プロンプト

I want you to be my prompt engineer. Your goal is to help me craft the best prompt for my needs. That prompt will be used by you in ChatGPT.

Follow the process below.

1. Your first response is to ask me what the prompt should be about.
I'll provide my answer, but it should go through the next steps and improve through continuous iterations.

2. Generate two sections based on my input.
a) Revised Prompt (Provide a rewritten prompt. It should be clear, concise, and easy for you to understand.)
b) suggestions (suggesting details to include in the prompt to improve it);
c) Questions (ask any relevant questions about any additional information you need from me to improve the prompt)

3. This iterative process continues until I provide you with additional information, you update the prompts in the Revised Prompts section, and I am done.

回答

It's about test code generation in Python from materials such as OpenAI Specifications, PlantUML sequence diagrams, PlantUML class diagrams, and so on.

生成されたプロンプト #1

回答

As my answer for your question 1, we should use unittest module in the test code.
As my answer for your question 2, I am an expert of Python programmer. I have experienced Python for more than a decade.
As my answer for your question 3, the use case of this prompt is enterprise software development. They need to create not only software based on test codes but also documents that explain the software. They will write test codes after writing the documents such as OpenAPI Specifications, PlantUML sequence diagrams and PlantUML class diagrams. So, if they get test codes using the prompt based on the documents, they will be happy to save a lot of effort to write test codes.

生成されたプロンプト #2

回答

As my answer for your question 1, I would like you to focus on generate MECE (mutually exclusive and collectively exhaustive) test cases, and organized, structured and clean test codes. So you should consider not only normal test cases but also error cases including corner or edge cases. To realize that, you need to understand how the software is used from the documents and make some assumptions for test data which you need to prepare and use in the test cases.
As my answer for your question 2, there are no additional Python libraries or tools. But, to use test data in the test codes, you will use patch module of unittest.mock.

※ 生成されたプロンプト #3 〜 #13 は省略します。(長くなり、かつ英語であるため。) 本記事末尾の Appendix に掲載してありますので、必要に応じてご確認ください。

生成されたプロンプト #14

回答

As my answer for your question 1, I would like you to organize and bulletize the revised prompt and suggestions. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so please make further adjustments.

生成されたプロンプト #15

プロンプトで伝えたいことは伝えたので、ここで終了としておきました。

I don't think I need to answer those questions because we are done!

本記事では #14 のプロンプトを採用します。(#15 については Focus on で始まる柱書と各項目との関係 (特に while の範囲) が #14 から変わってしまっているように見受けられるため。)

参考までに、某機械翻訳による和訳を以下に掲載します。

また、#14 までで消費したトークンの情報は以下の通りです。

Token breakdown:
- Message history: 4868
- User query: 0
- System Message: 19
- Few-shot examples: 1
- Max response: 1600
- Estimated max tokens: 6488

準備

前回と同様の OpenAPI 仕様、PlantUML シーケンス図、PlantUML クラス図を使います。

プロンプト

プロンプトは以下の通りです。今回は全文を本記事に掲載します。

/*
How can one generate MECE (mutually exclusive and collectively exhaustive) test code in Python for various types of enterprise software using materials like OpenAPI Specifications (in JSON or YAML), PlantUML sequence diagrams (in text), and PlantUML class diagrams (in text), focusing on utilizing the unittest module and patch module of unittest.mock, while generating test cases and preparing test data step-by-step, and allowing the user to specify test targets based on the actors of the sequence diagrams? The test cases should be based on class diagrams first, then considering sequence diagrams, and finally taking into account the OpenAPI specifications when the test target is an API server. The test data should be immutable, defined in the setUp method of the test class that inherits unittest.TestCase, and additional test data can be specified in the test cases if the setUp test data is not sufficient. When using the patch module of unittest.mock for mocking, the @patch('...') decorator should be used instead of the with patch('...') as mock_foo: statement. Additionally, the spy test double pattern should be applied using the assert_called_once_with method of patch.object when applicable for the test case, and the with patch.object(...) as mock_foo: statement can be used in such cases. Each test case should follow the AAA (Arrange-Act-Assert) style. Be mindful of token limitations and avoid exceeding them when generating test code, and omit comments to save the number of characters.

* Test target: ペットストアAPI

* OpenAPI Specifications:
"""
{
  "swagger": "2.0",
  "info": {
    "version": "1.0.0",
    "title": "Swagger Petstore",
    "description": "A sample API that uses a petstore as an example to demonstrate features in the swagger-2.0 specification",
    "termsOfService": "http://swagger.io/terms/",
    "contact": {
      "name": "Swagger API Team"
    },
    "license": {
      "name": "MIT"
    }
  },
  "host": "petstore.swagger.io",
  "basePath": "/api",
  "schemes": [
    "http"
  ],
  "consumes": [
    "application/json"
  ],
  "produces": [
    "application/json"
  ],
  "paths": {
    "/pets": {
      "get": {
        "description": "Returns all pets from the system that the user has access to",
        "operationId": "findPets",
        "produces": [
          "application/json",
          "application/xml",
          "text/xml",
          "text/html"
        ],
        "parameters": [
          {
            "name": "tags",
            "in": "query",
            "description": "tags to filter by",
            "required": false,
            "type": "array",
            "items": {
              "type": "string"
            },
            "collectionFormat": "csv"
          },
          {
            "name": "limit",
            "in": "query",
            "description": "maximum number of results to return",
            "required": false,
            "type": "integer",
            "format": "int32"
          }
        ],
        "responses": {
          "200": {
            "description": "pet response",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Pet"
              }
            }
          },
          "default": {
            "description": "unexpected error",
            "schema": {
              "$ref": "#/definitions/ErrorModel"
            }
          }
        }
      },
      "post": {
        "description": "Creates a new pet in the store.  Duplicates are allowed",
        "operationId": "addPet",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "pet",
            "in": "body",
            "description": "Pet to add to the store",
            "required": true,
            "schema": {
              "$ref": "#/definitions/NewPet"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "pet response",
            "schema": {
              "$ref": "#/definitions/Pet"
            }
          },
          "default": {
            "description": "unexpected error",
            "schema": {
              "$ref": "#/definitions/ErrorModel"
            }
          }
        }
      }
    },
    "/pets/{id}": {
      "get": {
        "description": "Returns a user based on a single ID, if the user does not have access to the pet",
        "operationId": "findPetById",
        "produces": [
          "application/json",
          "application/xml",
          "text/xml",
          "text/html"
        ],
        "parameters": [
          {
            "name": "id",
            "in": "path",
            "description": "ID of pet to fetch",
            "required": true,
            "type": "integer",
            "format": "int64"
          }
        ],
        "responses": {
          "200": {
            "description": "pet response",
            "schema": {
              "$ref": "#/definitions/Pet"
            }
          },
          "default": {
            "description": "unexpected error",
            "schema": {
              "$ref": "#/definitions/ErrorModel"
            }
          }
        }
      },
      "delete": {
        "description": "deletes a single pet based on the ID supplied",
        "operationId": "deletePet",
        "parameters": [
          {
            "name": "id",
            "in": "path",
            "description": "ID of pet to delete",
            "required": true,
            "type": "integer",
            "format": "int64"
          }
        ],
        "responses": {
          "204": {
            "description": "pet deleted"
          },
          "default": {
            "description": "unexpected error",
            "schema": {
              "$ref": "#/definitions/ErrorModel"
            }
          }
        }
      }
    }
  },
  "definitions": {
    "Pet": {
      "type": "object",
      "allOf": [
        {
          "$ref": "#/definitions/NewPet"
        },
        {
          "required": [
            "id"
          ],
          "properties": {
            "id": {
              "type": "integer",
              "format": "int64"
            }
          }
        }
      ]
    },
    "NewPet": {
      "type": "object",
      "required": [
        "name"
      ],
      "properties": {
        "name": {
          "type": "string"
        },
        "tag": {
          "type": "string"
        }
      }
    },
    "ErrorModel": {
      "type": "object",
      "required": [
        "code",
        "message"
      ],
      "properties": {
        "code": {
          "type": "integer",
          "format": "int32"
        },
        "message": {
          "type": "string"
        }
      }
    }
  }
}
"""

* PlantUML sequence diagrams:
"""
@startuml

title ペット
header %page% of %lastpage%
footer Copyright(c) All rights reserved.

autoactivate on
autonumber "<b>[00]"

actor ユーザー as user
entity ペットストアUI as ui
entity ペットストアAPI as api
database データベース as db

== ペットのリストを取得 ==

user -> ui : ペットリストボタンをクリック
ui -> api : findPets
api -> db : SELECT * FROM pets
return
return 200, pet response
return ペットのリストを表示

== ペットの詳細を取得 ==

user -> ui : ペットの詳細ボタンをクリック
ui -> api : findPetById
api -> db : SELECT * FROM pets WHERE id = ${pet_id}
return
return 200, pet response
return ペットの詳細を表示

== ペットを購入 ==

user -> ui : ペットの購入ボタンをクリック
ui -> api : addPet
group transaction
    api -> db : transaction
    api -> db : INSERT INTO orders VALUES (${user_id}, ${pet_id}, ${datetime}, ${created_at}, ${updated_at})
    api -> db : DELETE FROM pets WHERE pet_id = ${pet_id}
    api -> db : commit
    return
    return
    return
    return
end
return 200, pet response
return 購入完了画面を表示

== ペットの登録を削除 ==

user -> ui : ペットの登録削除ボタンをクリック
ui -> api : deletePet
group transaction
    api -> db : DELETE FROM pets WHERE pet_id = (SELECT id FROM orders WHERE pet_id = ${pet_id} AND user_id = ${user_id})
    api -> db : DELETE FROM orders WHERE pet_id = ${pet_id} AND user_id = ${user_id}
    return
    return
end
return 204, pet deleted
return 削除完了画面を表示

== ペットの購入に失敗 ==

user -> ui : ペットの購入ボタンをクリック
ui -> api : addPet
group transaction
    api -> db : transaction
    api -> db : INSERT INTO orders VALUES (${user_id}, ${pet_id}, ${datetime}, ${created_at}, ${updated_at})
    api -> db !! : DELETE FROM pets WHERE pet_id = ${pet_id}
    api -> db : rollback
    return
    return
    return
end
return 500, unexpected error
return 購入失敗画面を表示

@enduml
"""

* PlantUML class diagrams:
"""
@startuml

class User {
    - id
    - name
}

class Pet {
    - id
    - name
    - type
    - tag
}

class Order {
    - id
    - user_id
    - pet_id
}

Order "1" -- "*" User
Order "1" -- "*" Pet

@enduml
"""
*/

モデル

前回同様 text-davinci-003 を使用します。

入出力インターフェース

前回同様 Azure OpenAI Studio にある Playground を使用します。

実行手順

前回同様です。

結果

上記プロンプトを 5回実行し、それぞれの結果を取得しました。以下、各結果に関する、プロンプトにおける指示内容に対する形式チェック結果です。(なお、指示項目の洗い出しにあたっては生成したプロンプトの #15 を利用しました。)

#	指示	結果1	結果2	結果3	結果4	結果5
1	MECE (mutually exclusive and collectively exhaustive) test code	✓	✓	✓	✓	✓
2	in Python	✓	✓	✓	✓	✓
3	utilizing the unittest module and patch module of unittest.mock	✓	✓	✓	✓	✓
4	Allowing the user to specify test targets based on the actors of the sequence diagrams	✓	✓	✓	✓	✓
5	Basing test cases on class diagrams first,	✓	-	✓	✓	✓
6	then sequence diagrams,	✓	✓	✓	✓	✓
7	and finally OpenAPI specifications for API servers	✓	✓	✓	✓	-
8	Defining immutable test data in the setUp method of the test class that inherits unittest.TestCase	✓	✓	✓	✓	✓
9	Specifying additional test data in test cases if needed	-	-	✓	-	-
10	Using the `@patch('...')` decorator for mocking instead of `with patch('...') as mock_foo:` statement	✓	-	-	-	-
11	Applying the spy test double pattern using the `assert_called_once_with` method of `patch.object` when applicable	✓	✓	✓	✓	✓
12	Using the `with patch.object(...) as mock_foo:` statement in spy test double pattern cases	-	✓	✓	-	✓
13	Following the AAA (Arrange-Act-Assert) style for each test case	✓	✓	✓	✓	✓
14	Being mindful of token limitations and avoiding exceeding them when generating test code	✓	-	✓	✓	✓
15	Omitting comments to save the number of characters	-	-	-	-	✓

各結果ともに多くの指示内容を満たしています。
指示項目ごとの重要度の高低を加味すると、結果1 が頭一つ抜きん出ています。

以下、個別の結果と、それに対する評価です。なお、上記の形式チェックで確認したポイントについては個別の評価においては言及しません。

結果1

import unittest
from unittest.mock import patch, patch.object

class TestPetStoreAPI(unittest.TestCase):
    def setUp(self):
        self.pet_data = {
            'id': 1,
            'name': 'Fido',
            'type': 'dog',
            'tag': 'cute'
        }
        self.order_data = {
            'id': 1,
            'user_id': 1,
            'pet_id': 1
        }

    @patch('petstore.api.findPets')
    def test_find_pets(self, mock_find_pets):
        # Arrange
        expected_response = {
            'status': 200,
            'data': [self.pet_data]
        }

        # Act
        mock_find_pets.return_value = expected_response
        response = petstore.api.findPets()

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pets.assert_called_once_with()

    @patch('petstore.api.findPetById')
    def test_find_pet_by_id(self, mock_find_pet_by_id):
        # Arrange
        expected_response = {
            'status': 200,
            'data': self.pet_data
        }

        # Act
        mock_find_pet_by_id.return_value = expected_response
        response = petstore.api.findPetById(self.pet_data['id'])

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pet_by_id.assert_called_once_with(self.pet_data['id'])

    @patch('petstore.api.addPet')
    def test_add_pet(self, mock_add_pet):
        # Arrange
        expected_response = {
            'status': 200,
            'data': self.pet_data
        }

        # Act
        mock_add_pet.return_value = expected_response
        response = petstore.api.addPet(self.pet_data)

        # Assert
        self.assertEqual(response, expected_response)
        mock_add_pet.assert_called_once_with(self.pet_data)

    @patch('petstore.api.deletePet')
    @patch.object(petstore.api, 'findPetById', return_value={'status': 200, 'data': self.pet_data})
    def test_delete_pet(self, mock_find_pet_by_id, mock_delete_pet):
        # Arrange
        expected_response = {
            'status': 204,
            'data': None
        }

        # Act
        mock_delete_pet.return_value = expected_response
        response = petstore.api.deletePet(self.pet_data['id'])

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pet_by_id.assert_called_once_with(self.pet_data['id'])
        mock_delete_pet.assert_called_once_with(self.pet_data['id'])

    @patch('petstore.api.addPet')
    @patch.object(petstore.api, 'findPetById', return_value={'status': 200, 'data': self.pet_data})
    def test_add_pet_failure(self, mock_find_pet_by_id, mock_add_pet):
        # Arrange
        expected_response = {
            'status': 500,
            'data': None
        }

        # Act
        mock_add_pet.return_value = expected_response
        response = petstore.api.addPet(self.pet_data)

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pet_by_id.assert_called_once_with(self.pet_data['id'])
        mock_add_pet.assert_called_once_with(self.pet_data)

良い点

特に言及すべき点は無し

改善点

テスト対象にテストダブルが使われてしまっている。
テストデータ self.order_data が使われていない。

結果2

import unittest
from unittest.mock import patch, patch.object

class TestPetStoreAPI(unittest.TestCase):
    def setUp(self):
        self.pet_list = [
            {
                'id': 1,
                'name': 'Fido',
                'type': 'dog',
                'tag': 'friendly'
            },
            {
                'id': 2,
                'name': 'Fluffy',
                'type': 'cat',
                'tag': 'playful'
            }
        ]
        self.user_id = 1
        self.pet_id = 1
        self.datetime = '2020-01-01 00:00:00'
        self.created_at = '2020-01-01 00:00:00'
        self.updated_at = '2020-01-01 00:00:00'

    def test_find_pets(self):
        # Arrange
        expected_response = {
            'status': 200,
            'data': self.pet_list
        }

        # Act
        with patch('petstore.findPets') as mock_find_pets:
            mock_find_pets.return_value = expected_response
            response = petstore.findPets()

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pets.assert_called_once_with()

    def test_find_pet_by_id(self):
        # Arrange
        expected_response = {
            'status': 200,
            'data': self.pet_list[0]
        }

        # Act
        with patch('petstore.findPetById') as mock_find_pet_by_id:
            mock_find_pet_by_id.return_value = expected_response
            response = petstore.findPetById(self.pet_id)

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pet_by_id.assert_called_once_with(self.pet_id)

    def test_add_pet_success(self):
        # Arrange
        expected_response = {
            'status': 200,
            'data': self.pet_list[0]
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            with patch.object(db, 'transaction') as mock_transaction:
                with patch.object(db, 'INSERT INTO orders VALUES') as mock_insert_orders:
                    with patch.object(db, 'DELETE FROM pets WHERE pet_id =') as mock_delete_pets:
                        with patch.object(db, 'commit') as mock_commit:
                            mock_add_pet.return_value = expected_response
                            response = petstore.addPet(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)

        # Assert
        self.assertEqual(response, expected_response)
        mock_add_pet.assert_called_once_with(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)
        mock_transaction.assert_called_once_with()
        mock_insert_orders.assert_called_once_with(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)
        mock_delete_pets.assert_called_once_with(self.pet_id)
        mock_commit.assert_called_once_with()

    def test_add_pet_failure(self):
        # Arrange
        expected_response = {
            'status': 500,
            'data': {
                'code': 500,
                'message': 'unexpected error'
            }
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            with patch.object(db, 'transaction') as mock_transaction:
                with patch.object(db, 'INSERT INTO orders VALUES') as mock_insert_orders:
                    with patch.object(db, 'DELETE FROM pets WHERE pet_id =') as mock_delete_pets:
                        with patch.object(db, 'rollback') as mock_rollback:
                            mock_add_pet.return_value = expected_response
                            response = petstore.addPet(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)

        # Assert
        self.assertEqual(response, expected_response)
        mock_add_pet.assert_called_once_with(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)
        mock_transaction.assert_called_once_with()
        mock_insert_orders.assert_called_once_with(self.user_id, self.pet_id, self.datetime, self.created_at, self.updated_at)
        mock_delete_pets.assert_called_once_with(self.pet_id)
        mock_rollback.assert_called_once_with()

    def test_delete_pet(self):
        # Arrange
        expected_response = {
            'status': 204,
            'data':

良い点

シーケンス図に記述された SQL から datetime, created_at, updated_at を拾って補完した。
トランザクションが細かくテストされている。

改善点

- テスト対象にテストダブルが使われてしまっている。
with ステートメントのネストが深い

結果3

import unittest
from unittest.mock import patch, patch.object

class TestPetStoreAPI(unittest.TestCase):
    def setUp(self):
        self.test_pets = [
            {
                "id": 1,
                "name": "Cat",
                "type": "cat",
                "tag": "feline"
            },
            {
                "id": 2,
                "name": "Dog",
                "type": "dog",
                "tag": "canine"
            }
        ]
        self.test_user = {
            "id": 1,
            "name": "John Doe"
        }
        self.test_order = {
            "id": 1,
            "user_id": 1,
            "pet_id": 1
        }

    def test_find_pets(self):
        # Arrange
        expected_pets = self.test_pets
        tags = ["feline", "canine"]
        limit = 2

        # Act
        with patch('petstore.findPets') as mock_find_pets:
            mock_find_pets.return_value = expected_pets
            actual_pets = petstore.findPets(tags, limit)

        # Assert
        self.assertEqual(expected_pets, actual_pets)
        mock_find_pets.assert_called_once_with(tags, limit)

    def test_find_pet_by_id(self):
        # Arrange
        expected_pet = self.test_pets[0]
        pet_id = 1

        # Act
        with patch('petstore.findPetById') as mock_find_pet_by_id:
            mock_find_pet_by_id.return_value = expected_pet
            actual_pet = petstore.findPetById(pet_id)

        # Assert
        self.assertEqual(expected_pet, actual_pet)
        mock_find_pet_by_id.assert_called_once_with(pet_id)

    def test_add_pet_success(self):
        # Arrange
        expected_pet = self.test_pets[0]
        pet = {
            "name": "Cat",
            "type": "cat",
            "tag": "feline"
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            mock_add_pet.return_value = expected_pet
            with patch.object(petstore, 'transaction') as mock_transaction:
                mock_transaction.return_value = None
                actual_pet = petstore.addPet(pet)

        # Assert
        self.assertEqual(expected_pet, actual_pet)
        mock_add_pet.assert_called_once_with(pet)
        mock_transaction.assert_called_once()

    def test_add_pet_failure(self):
        # Arrange
        expected_error = {
            "code": 500,
            "message": "unexpected error"
        }
        pet = {
            "name": "Cat",
            "type": "cat",
            "tag": "feline"
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            mock_add_pet.return_value = expected_error
            with patch.object(petstore, 'transaction') as mock_transaction:
                mock_transaction.return_value = None
                actual_error = petstore.addPet(pet)

        # Assert
        self.assertEqual(expected_error, actual_error)
        mock_add_pet.assert_called_once_with(pet)
        mock_transaction.assert_called_once()

    def test_delete_pet(self):
        # Arrange
        expected_response = {
            "code": 204,
            "message": "pet deleted"
        }
        pet_id = 1
        user_id = 1

        # Act
        with patch('petstore.deletePet') as mock_delete_pet:
            mock_delete_pet.return_value = expected_response
            with patch.object(petstore, 'transaction') as mock_transaction:
                mock_transaction.return_value = None
                actual_response = petstore.deletePet(pet_id, user_id)

        # Assert
        self.assertEqual(expected_response, actual_response)
        mock_delete_pet.assert_called_once_with(pet_id, user_id)
        mock_transaction.assert_called_once()

良い点

limit を想定して補完している。
トランザクションがテストされている。

改善点

テスト対象にテストダブルが使われてしまっている。
テストデータ self.test_user, self.test_order が使われていない。

結果4

import unittest
from unittest.mock import patch, patch.object

class TestPetStoreAPI(unittest.TestCase):
    def setUp(self):
        self.test_pets = [
            {
                "id": 1,
                "name": "Fido",
                "type": "dog",
                "tag": "friendly"
            },
            {
                "id": 2,
                "name": "Whiskers",
                "type": "cat",
                "tag": "curious"
            }
        ]
        self.test_user = {
            "id": 1,
            "name": "John Doe"
        }
        self.test_order = {
            "id": 1,
            "user_id": 1,
            "pet_id": 1
        }

    def test_find_pets(self):
        # Arrange
        expected_response = {
            "status": 200,
            "data": self.test_pets
        }

        # Act
        with patch('petstore.findPets') as mock_find_pets:
            mock_find_pets.return_value = expected_response
            response = petstore.findPets()

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pets.assert_called_once_with()

    def test_find_pet_by_id(self):
        # Arrange
        pet_id = 1
        expected_response = {
            "status": 200,
            "data": self.test_pets[0]
        }

        # Act
        with patch('petstore.findPetById') as mock_find_pet_by_id:
            mock_find_pet_by_id.return_value = expected_response
            response = petstore.findPetById(pet_id)

        # Assert
        self.assertEqual(response, expected_response)
        mock_find_pet_by_id.assert_called_once_with(pet_id)

    def test_add_pet_success(self):
        # Arrange
        pet = self.test_pets[0]
        expected_response = {
            "status": 200,
            "data": pet
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            mock_add_pet.return_value = expected_response
            response = petstore.addPet(pet)

        # Assert
        self.assertEqual(response, expected_response)
        mock_add_pet.assert_called_once_with(pet)

    def test_add_pet_failure(self):
        # Arrange
        pet = self.test_pets[0]
        expected_response = {
            "status": 500,
            "data": None
        }

        # Act
        with patch('petstore.addPet') as mock_add_pet:
            mock_add_pet.return_value = expected_response
            response = petstore.addPet(pet)

        # Assert
        self.assertEqual(response, expected_response)
        mock_add_pet.assert_called_once_with(pet)

    def test_delete_pet(self):
        # Arrange
        pet_id = 1
        expected_response = {
            "status": 204,
            "data": None
        }

        # Act
        with patch('petstore.deletePet') as mock_delete_pet:
            mock_delete_pet.return_value = expected_response
            response = petstore.deletePet(pet_id)

        # Assert
        self.assertEqual(response, expected_response)
        mock_delete_pet.assert_called_once_with(pet_id)

良い点

特に言及すべき点は無し

改善点

テスト対象にテストダブルが使われてしまっている。
テストデータ self.test_user, self.test_order が使用されていない。

結果5

import unittest
from unittest.mock import patch, patch.object

class TestPetStoreAPI(unittest.TestCase):
    def setUp(self):
        self.pet_data = {
            'id': 1,
            'name': 'Fido',
            'type': 'dog',
            'tag': 'friendly'
        }
        self.order_data = {
            'id': 1,
            'user_id': 1,
            'pet_id': 1
        }

    def test_find_pets(self):
        with patch('petstore.api.find_pets') as mock_find_pets:
            mock_find_pets.return_value = [self.pet_data]
            result = petstore.api.find_pets()
            self.assertEqual(result, [self.pet_data])
            mock_find_pets.assert_called_once_with()

    def test_find_pet_by_id(self):
        with patch('petstore.api.find_pet_by_id') as mock_find_pet_by_id:
            mock_find_pet_by_id.return_value = self.pet_data
            result = petstore.api.find_pet_by_id(1)
            self.assertEqual(result, self.pet_data)
            mock_find_pet_by_id.assert_called_once_with(1)

    def test_add_pet(self):
        with patch('petstore.api.add_pet') as mock_add_pet:
            mock_add_pet.return_value = self.pet_data
            result = petstore.api.add_pet(self.pet_data)
            self.assertEqual(result, self.pet_data)
            mock_add_pet.assert_called_once_with(self.pet_data)

    def test_delete_pet(self):
        with patch('petstore.api.delete_pet') as mock_delete_pet:
            mock_delete_pet.return_value = None
            result = petstore.api.delete_pet(1)
            self.assertIsNone(result)
            mock_delete_pet.assert_called_once_with(1)

    def test_add_pet_transaction_success(self):
        with patch.object(petstore.api, 'add_pet_transaction') as mock_add_pet_transaction:
            mock_add_pet_transaction.return_value = self.pet_data
            result = petstore.api.add_pet_transaction(self.pet_data, self.order_data)
            self.assertEqual(result, self.pet_data)
            mock_add_pet_transaction.assert_called_once_with(self.pet_data, self.order_data)

    def test_add_pet_transaction_failure(self):
        with patch.object(petstore.api, 'add_pet_transaction') as mock_add_pet_transaction:
            mock_add_pet_transaction.side_effect = Exception('Transaction failed')
            with self.assertRaises(Exception):
                petstore.api.add_pet_transaction(self.pet_data, self.order_data)
            mock_add_pet_transaction.assert_called_once_with(self.pet_data, self.order_data)

良い点

トランザクションがテストされている。
トランザクション失敗時に例外が上がることを想定している。

改善点

テスト対象にテストダブルが使われてしまっている。
特に言及すべき点は無し

総評

前回よりもプロンプトに忠実な結果を安定的に得られるようになりました。

すべての結果に共通している問題は、テストダブルがテスト対象に対して使われてしまっている点です。
相手がプログラマならこのようなことは起こらないと思いますが、相手が AI だと起こってしまうということに、後から気づかされました。
これは、 AI がテストダブルの書き方は知っていても、その意義までは理解をしていないということだと思われます。（意義を言葉で説明することはできるかもしれませんが。）
あらかじめ PromptGenerator によるプロンプト生成時にインプットしておく必要がありました。

今後この問題が解決したらの話ですが、結果に多少の差異はあるものの、トークン数制限に到達してしまった結果2 を除き、概ねどの結果でもそのまま採用することができるのではないかと思います。
その中でも、結果1 はプロンプトに最も忠実となっており、個人的には結果1 をテストコードとして採用したいと思いました。

まとめ

ChatGPT にプロンプトを生成させる PromptGenerator メソッドを用いて、プロンプトのチューニングを行い、そのプロンプトを用いて前回同様のアプローチでテストコードを生成し、その内容を評価しました。

プロンプトのチューニング方法は様々ありますが、このメソッドによるプロンプトのチューニングを採用するだけで、アウトプットの品質が飛躍的に高まることが確認できました。

前回は「コードスニペット」としての用途にしか使えないと評しましたが、今回のアウトプットはほぼほぼテストコードの体をなしています。Zero-shot でここまでできることに驚きました。
ただし、テストダブルをテスト対象にまで使用しているのは問題です。これは PromptGenerator の段階で解決しないといけません。

インプットしたドキュメントはサンプルレベルでしたが、これを精緻化することで、より実践的なテストコードが生成できることが期待できます。
その際にはやはりトークン数制限が問題になるので、GPT-4 を使用する必要が出てくるでしょう。
また、より再現性を高めるために、プロンプトの生成とテストコードの生成の両方において Temperature をより 0 に近づける試みも、後々必要になってくると思われます。

以上、Azure OpenAI Service を使ってドキュメント (OpenAPI 仕様, PlantUML のシーケンス図およびクラス図) からテストコードを生成し、その内容を評価する取り組みについて、少なくとも GPT-3 を利用したものに関しては、これにて一旦終了したいと思います。

Appendix: 生成されたプロンプト #2 〜 #13

生成されたプロンプト #3

回答

As my answer for your question 1, there are no examples or scenarios.
As my answer for your question 2, I would like you to focus on a general approach that can be applied across various types of software.

生成されたプロンプト #4

回答

As my answer for your question 1, I would like you to generate test codes step-by-step. You will think of what test case you should generate next for each test case. You will prepare necessary test data step-by-step as well.
As my answer for your question2, no, I still am not. In the prompt, I will give you OpenAPI Specifications in JSON or YAML, PlantUML sequence diagrams in text, and PlantUML class diagrams in text as well. I would like you to prepare fields for these input materials that you will receive.

生成されたプロンプト #5

回答

As my answer for your question 1, I would like you to let me specify the test target, meaning what to test. It is usually chosen from actors of the sequence diagrams. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions. Please make further adjustments.

生成されたプロンプト #6

回答

As my answer for your question 1, I would like to emphasize that test cases should be based on the class diagrams first. And then, you consider the sequence diagrams. Do not forget considering the OpenAPI specifications when the test target is an API server. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so please make further adjustments.

生成されたプロンプト #7

回答

As my answer for your question 1, I would like you to make the test data immutable. To achieve that, you can define them in setUp method of the test class which inherits unittest.TestCase. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so I expect you to make further adjustments.

生成されたプロンプト #8

回答

As my answer for your question 1, if the test data which you defined in the setUp method of the test class do not make sense for additional test cases you considered based on the documents, you can specify additional test data in the test cases. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so I would like you to make further adjustments.

生成されたプロンプト #9

回答

As my answer for your question 1, when you use patch module of unittest.mock for mocking, I would like you to use decorator like @patch('...') rather thatn with statement like with patch('...') as mock_foo:. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so please make further adjustments.

生成されたプロンプト #10

回答

As my answer for your question 1, I would like you to apply spy test double pattern using assert_called_once_with method of patch.object when the pattern is applicable for the test case. If you use the spy pattern, you can use patch.object under with statement like with patch.object(...) as mock_foo:. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so I would like you to make further adjustments.

生成されたプロンプト #11

回答

As my answer for your question 1, I would like you to write each test case in AAA (Arrange-Act-Assert) style. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so please make further adjustments.

生成されたプロンプト #12

回答

As my answer for your question 1, if you are aware of current token count, be careful not to exceed the limitation of tokens especially if you are a model of GPT-3. The token length limit is 4096 for GPT-3, 8192 for GPT-4, and 32768 for GPT-4-32k. Not to exceed the limitation, you do not need to output comments. If you are about to exceed the limitation, you can stop outputting next test case and finish the test class. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so I would like you to make further adjustments.

生成されたプロンプト #13

回答

As my answer for your question 1, you do not have to write comments to save the number of characters and not to exceed the token limitation. As my answer for your question 2, I am still not satisfied with the revised prompt and suggestions, so please make further adjustments.