cover-agent を使ってみる

CoverAgent とは何

LLMベースのユニットテスト自動生成CLIツール

インストール

$ rye tools install cover-agent --git https://github.com/Codium-ai/cover-agent.git

サンプルプロジェクトを rye 化して動かしてみる

$ git clone https://github.com/Codium-ai/cover-agent
$ cd cover-agent
$ rye init -r requirements.txt
# テスト実行
$ rye test

$ cover-agent \
  --source-file-path "app.py" \
  --test-file-path "test_app.py" \
  --code-coverage-report-path "coverage.xml" \
  --test-command "rye test -- --cov=. --cov-report=xml --cov-report=term" \
  --test-command-dir "." \
  --coverage-type "cobertura" \
  --desired-coverage 70 \
  --max-iterations 10

2024-06-10 19:30:46,995 - cover_agent.UnitTestGenerator - INFO - Running build/test command to generate coverage report: "rye test -- --cov=. --cov-report=xml --cov-report=term"
Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 2
test_headers_indentation: 0
```

Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 2
relevant_line_number_to_insert_after: 22
```

2024-06-10 19:30:51,850 - cover_agent.CoverAgent - INFO - Current Coverage: 62.79%
2024-06-10 19:30:51,850 - cover_agent.CoverAgent - INFO - Desired Coverage: 70%
Streaming results from LLM model...
```yaml
language: python
existing_test_function_signature: |
  def test_root():
new_tests:
- test_behavior: |
    Test the add endpoint by sending a GET request to "/add/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_add
  test_code: |
    def test_add():
        response = client.get("/add/3/4")
        assert response.status_code == 200
        assert response.json() == {"result": 7}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the subtract endpoint by sending a GET request to "/subtract/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_subtract
  test_code: |
    def test_subtract():
        response = client.get("/subtract/10/4")
        assert response.status_code == 200
        assert response.json() == {"result": 6}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the multiply endpoint by sending a GET request to "/multiply/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_multiply
  test_code: |
    def test_multiply():
        response = client.get("/multiply/3/4")
        assert response.status_code == 200
        assert response.json() == {"result": 12}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the divide endpoint by sending a GET request to "/divide/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_divide
  test_code: |
    def test_divide():
        response = client.get("/divide/8/2")
        assert response.status_code == 200
        assert response.json() == {"result": 4.0}
  new_imports_code: |
    ""
  test_tags: happy path
```

2024-06-10 19:31:03,382 - cover_agent.UnitTestGenerator - INFO - Total token used count for LLM model gpt-4o: 2422
2024-06-10 19:31:03,385 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:04,992 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 65.12%
2024-06-10 19:31:04,992 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:06,607 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 67.44%
2024-06-10 19:31:06,607 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:08,219 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 69.77%
2024-06-10 19:31:08,220 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:09,828 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 74.42%
2024-06-10 19:31:09,828 - cover_agent.CoverAgent - INFO - Reached above target coverage of 70% (Current Coverage: 74.42%) in 1 iterations.
python-fastapi

テストカバレッジが 74% まで上がった。

mizchi

coverage を 80% にして、もう一周させてみた

2024-06-10 20:19:12,589 - cover_agent.UnitTestGenerator - INFO - Running build/test command to generate coverage report: "rye test -- --cov=. --cov-report=xml --cov-report=term"
Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 6
test_headers_indentation: 0
```

Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 6
relevant_line_number_to_insert_after: 45
```

2024-06-10 20:19:18,240 - cover_agent.CoverAgent - INFO - Current Coverage: 74.42%
2024-06-10 20:19:18,240 - cover_agent.CoverAgent - INFO - Desired Coverage: 80%
Streaming results from LLM model...
```yaml
language: python
existing_test_function_signature: |
  def test_root():
new_tests:
- test_behavior: |
    Test the divide endpoint with a zero denominator to ensure it raises an HTTPException with status code 400.
  test_name: |
    test_divide_by_zero
  test_code: |
    def test_divide_by_zero():
        response = client.get("/divide/8/0")
        assert response.status_code == 400
        assert response.json() == {"detail": "Cannot divide by zero"}
  new_imports_code: |
    ""
  test_tags: edge case

- test_behavior: |
    Test the square endpoint to ensure it returns the correct square of a number.
  test_name: |
    test_square
  test_code: |
    def test_square():
        response = client.get("/square/5")
        assert response.status_code == 200
        assert response.json() == {"result": 25}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the sqrt endpoint with a negative number to ensure it raises an HTTPException with status code 400.
  test_name: |
    test_sqrt_negative
  test_code: |
    def test_sqrt_negative():
        response = client.get("/sqrt/-4")
        assert response.status_code == 400
        assert response.json() == {"detail": "Cannot take square root of a negative number"}
  new_imports_code: |
    import math
  test_tags: edge case

- test_behavior: |
    Test the is_palindrome endpoint to ensure it correctly identifies a palindrome.
  test_name: |
    test_is_palindrome
  test_code: |
    def test_is_palindrome():
        response = client.get("/is-palindrome/radar")
        assert response.status_code == 200
        assert response.json() == {"is_palindrome": True}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the days_until_new_year endpoint to ensure it returns the correct number of days until the next New Year.
  test_name: |
    test_days_until_new_year
  test_code: |
    def test_days_until_new_year():
        response = client.get("/days-until-new-year")
        assert response.status_code == 200
        assert "days_until_new_year" in response.json()
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the echo endpoint to ensure it returns the same message that is sent to it.
  test_name: |
    test_echo
  test_code: |
    def test_echo():
        response = client.get("/echo/hello")
        assert response.status_code == 200
        assert response.json() == {"message": "hello"}
  new_imports_code: |
    ""
  test_tags: happy path
```

2024-06-10 20:19:41,982 - cover_agent.UnitTestGenerator - INFO - Total token used count for LLM model gpt-4o: 2736
2024-06-10 20:19:41,984 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:43,631 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 76.74%
2024-06-10 20:19:43,631 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:45,264 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 79.07%
2024-06-10 20:19:45,265 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:46,939 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 83.72%
2024-06-10 20:19:46,939 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:48,628 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'is_palindrome': True}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      6    86%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      6    93%
Coverage XML written to file coverage.xml
2024-06-10 20:19:48,629 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
INFO:cover_agent.UnitTestGenerator:Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:50,277 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
INFO:cover_agent.UnitTestGenerator:Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'days_until_new_year': 205}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      3    93%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      3    97%
Coverage XML written to file coverage.xml
2024-06-10 20:19:50,278 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
INFO:cover_agent.UnitTestGenerator:Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:51,954 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
INFO:cover_agent.UnitTestGenerator:Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'message': 'hello'}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      6    86%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      6    93%
Coverage XML written to file coverage.xml
2024-06-10 20:19:51,955 - cover_agent.CoverAgent - INFO - Reached above target coverage of 80% (Current Coverage: 83.72%) in 1 iterations.
INFO:cover_agent.CoverAgent:Reached above target coverage of 80% (Current Coverage: 83.72%) in 1 iterations.
python-fastapi

83%

mizchi

もういっちょ。90%

2024-06-10 20:19:12,589 - cover_agent.UnitTestGenerator - INFO - Running build/test command to generate coverage report: "rye test -- --cov=. --cov-report=xml --cov-report=term"
Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 6
test_headers_indentation: 0
```

Streaming results from LLM model...
```yaml
language: python
testing_framework: pytest
number_of_tests: 6
relevant_line_number_to_insert_after: 45
```

2024-06-10 20:19:18,240 - cover_agent.CoverAgent - INFO - Current Coverage: 74.42%
2024-06-10 20:19:18,240 - cover_agent.CoverAgent - INFO - Desired Coverage: 80%
Streaming results from LLM model...
```yaml
language: python
existing_test_function_signature: |
  def test_root():
new_tests:
- test_behavior: |
    Test the divide endpoint with a zero denominator to ensure it raises an HTTPException with status code 400.
  test_name: |
    test_divide_by_zero
  test_code: |
    def test_divide_by_zero():
        response = client.get("/divide/8/0")
        assert response.status_code == 400
        assert response.json() == {"detail": "Cannot divide by zero"}
  new_imports_code: |
    ""
  test_tags: edge case

- test_behavior: |
    Test the square endpoint to ensure it returns the correct square of a number.
  test_name: |
    test_square
  test_code: |
    def test_square():
        response = client.get("/square/5")
        assert response.status_code == 200
        assert response.json() == {"result": 25}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the sqrt endpoint with a negative number to ensure it raises an HTTPException with status code 400.
  test_name: |
    test_sqrt_negative
  test_code: |
    def test_sqrt_negative():
        response = client.get("/sqrt/-4")
        assert response.status_code == 400
        assert response.json() == {"detail": "Cannot take square root of a negative number"}
  new_imports_code: |
    import math
  test_tags: edge case

- test_behavior: |
    Test the is_palindrome endpoint to ensure it correctly identifies a palindrome.
  test_name: |
    test_is_palindrome
  test_code: |
    def test_is_palindrome():
        response = client.get("/is-palindrome/radar")
        assert response.status_code == 200
        assert response.json() == {"is_palindrome": True}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the days_until_new_year endpoint to ensure it returns the correct number of days until the next New Year.
  test_name: |
    test_days_until_new_year
  test_code: |
    def test_days_until_new_year():
        response = client.get("/days-until-new-year")
        assert response.status_code == 200
        assert "days_until_new_year" in response.json()
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the echo endpoint to ensure it returns the same message that is sent to it.
  test_name: |
    test_echo
  test_code: |
    def test_echo():
        response = client.get("/echo/hello")
        assert response.status_code == 200
        assert response.json() == {"message": "hello"}
  new_imports_code: |
    ""
  test_tags: happy path
```

2024-06-10 20:19:41,982 - cover_agent.UnitTestGenerator - INFO - Total token used count for LLM model gpt-4o: 2736
2024-06-10 20:19:41,984 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:43,631 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 76.74%
2024-06-10 20:19:43,631 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:45,264 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 79.07%
2024-06-10 20:19:45,265 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:46,939 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 83.72%
2024-06-10 20:19:46,939 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:48,628 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'is_palindrome': True}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      6    86%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      6    93%
Coverage XML written to file coverage.xml
2024-06-10 20:19:48,629 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
INFO:cover_agent.UnitTestGenerator:Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:50,277 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
INFO:cover_agent.UnitTestGenerator:Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'days_until_new_year': 205}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      3    93%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      3    97%
Coverage XML written to file coverage.xml
2024-06-10 20:19:50,278 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
INFO:cover_agent.UnitTestGenerator:Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 20:19:51,954 - cover_agent.UnitTestGenerator - INFO - Skipping a generated test that failed
INFO:cover_agent.UnitTestGenerator:Skipping a generated test that failed
ERROR:root:Error message:
...
E         {'message': 'hello'}
E         Right contains 1 more item:
E         {'result': 7}
E         Use -v to get more diff

test_app.py:53: AssertionError

---------- coverage: platform darwin, python 3.12.3-final-0 ----------
Name          Stmts   Miss  Cover
---------------------------------
app.py           43      6    86%
test_app.py      46      0   100%
---------------------------------
TOTAL            89      6    93%
Coverage XML written to file coverage.xml
2024-06-10 20:19:51,955 - cover_agent.CoverAgent - INFO - Reached above target coverage of 80% (Current Coverage: 83.72%) in 1 iterations.
INFO:cover_agent.CoverAgent:Reached above target coverage of 80% (Current Coverage: 83.72%) in 1 iterations.
python-fastapi

97%

mizchi

次に100%を指示してみたが、どう頑張っても97%止まりだった。
それにしてもすごい

mizchi

何が起きたか。

テスト対象の main.py

from fastapi import FastAPI, HTTPException
from datetime import date

app = FastAPI()


@app.get("/")
async def root():
    """
    A simple function that serves as the root endpoint for the FastAPI application.
    No parameters are passed into the function.
    Returns a dictionary with a welcome message.
    """
    return {"message": "Welcome to the FastAPI application!"}


@app.get("/current-date")
async def current_date():
    """
    Get the current date as an ISO-formatted string.
    """
    return {"date": date.today().isoformat()}


@app.get("/add/{num1}/{num2}")
async def add(num1: int, num2: int):
    """
    An asynchronous function that takes two integer parameters 'num1' and 'num2', and returns a dictionary containing the result of adding 'num1' and 'num2' under the key 'result'.
    """
    return {"result": num1 + num2}


@app.get("/subtract/{num1}/{num2}")
async def subtract(num1: int, num2: int):
    """
    A function that subtracts two numbers and returns the result as a dictionary.

    Parameters:
        num1 (int): The first number to be subtracted.
        num2 (int): The second number to subtract from the first.

    Returns:
        dict: A dictionary containing the result of the subtraction.
    """
    return {"result": num1 - num2}


@app.get("/multiply/{num1}/{num2}")
async def multiply(num1: int, num2: int):
    """
    Multiply two numbers and return the result as a dictionary.

    Parameters:
    - num1 (int): The first number to be multiplied.
    - num2 (int): The second number to be multiplied.

    Returns:
    - dict: A dictionary containing the result of the multiplication.
    """
    return {"result": num1 * num2}


@app.get("/divide/{num1}/{num2}")
async def divide(num1: int, num2: int):
    """
    An asynchronous function that handles a GET request to divide two numbers.
    Parameters:
    - num1: an integer representing the numerator
    - num2: an integer representing the denominator
    Returns:
    - A dictionary containing the result of the division
    Raises:
    - HTTPException with status code 400 if num2 is 0
    """
    if num2 == 0:
        raise HTTPException(status_code=400, detail="Cannot divide by zero")
    return {"result": num1 / num2}


@app.get("/square/{number}")
async def square(number: int):
    """
    Return the square of a number.
    """
    return {"result": number**2}


@app.get("/sqrt/{number}")
async def sqrt(number: float):
    """
    Return the square root of a number. Returns an error for negative numbers.
    """
    if number < 0:
        raise HTTPException(
            status_code=400, detail="Cannot take square root of a negative number"
        )
    return {"result": math.sqrt(number)}


@app.get("/is-palindrome/{text}")
async def is_palindrome(text: str):
    """
    Check if a string is a palindrome.
    """
    return {"is_palindrome": text == text[::-1]}


@app.get("/days-until-new-year")
async def days_until_new_year():
    """
    Calculates the number of days until the next New Year.
    """
    today = date.today()
    next_new_year = date(today.year + 1, 1, 1)
    delta = next_new_year - today
    return {"days_until_new_year": delta.days}


@app.get("/echo/{message}")
async def echo(message: str):
    """
    Returns the same message that is sent to it.
    """
    return {"message": message}

四則演算を行う fastapi のサーバーが定義してある

tets_app.py: before

from datetime import date


from fastapi.testclient import TestClient
from app import app

client = TestClient(app)

def test_root():
    """
    Test the root endpoint by sending a GET request to "/" and checking the response status code and JSON body.
    """
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the FastAPI application!"}


def test_current_date():
    response = client.get("/current-date")
    assert response.status_code == 200
    assert "date" in response.json()
    assert response.json()["date"] == date.today().isoformat()

test_app.py: after

自分の手元の実行結果なので、再現性はおそらくない。

from datetime import date

from fastapi.testclient import TestClient
from app import app

client = TestClient(app)

def test_root():
    """
    Test the root endpoint by sending a GET request to "/" and checking the response status code and JSON body.
    """
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "Welcome to the FastAPI application!"}


def test_current_date():
    response = client.get("/current-date")
    assert response.status_code == 200
    assert "date" in response.json()
    assert response.json()["date"] == date.today().isoformat()


def test_divide():
    response = client.get("/divide/8/2")
    assert response.status_code == 200
    assert response.json() == {"result": 4.0}


def test_multiply():
    response = client.get("/multiply/3/4")
    assert response.status_code == 200
    assert response.json() == {"result": 12}


def test_subtract():
    response = client.get("/subtract/10/4")
    assert response.status_code == 200
    assert response.json() == {"result": 6}


def test_add():
    response = client.get("/add/3/4")
    assert response.status_code == 200
    assert response.json() == {"result": 7}

mizchi

ログを詳しく見ていく

2024-06-10 19:30:51,850 - cover_agent.CoverAgent - INFO - Current Coverage: 62.79%
2024-06-10 19:30:51,850 - cover_agent.CoverAgent - INFO - Desired Coverage: 70%

現在のテストカバレッジと、目標カバレッジを設定する。

次に、カバレッジを上げるためのリクエストパターンを yaml で生成しているっぽいデータを吐いている。

language: python
existing_test_function_signature: |
  def test_root():
new_tests:
- test_behavior: |
    Test the add endpoint by sending a GET request to "/add/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_add
  test_code: |
    def test_add():
        response = client.get("/add/3/4")
        assert response.status_code == 200
        assert response.json() == {"result": 7}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the subtract endpoint by sending a GET request to "/subtract/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_subtract
  test_code: |
    def test_subtract():
        response = client.get("/subtract/10/4")
        assert response.status_code == 200
        assert response.json() == {"result": 6}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the multiply endpoint by sending a GET request to "/multiply/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_multiply
  test_code: |
    def test_multiply():
        response = client.get("/multiply/3/4")
        assert response.status_code == 200
        assert response.json() == {"result": 12}
  new_imports_code: |
    ""
  test_tags: happy path

- test_behavior: |
    Test the divide endpoint by sending a GET request to "/divide/{num1}/{num2}" and checking the response status code and JSON body.
  test_name: |
    test_divide
  test_code: |
    def test_divide():
        response = client.get("/divide/8/2")
        assert response.status_code == 200
        assert response.json() == {"result": 4.0}
  new_imports_code: |
    ""
  test_tags: happy path

テストを1つずつ生成して、カバレッジが上昇してそうなログが流れている。

2024-06-10 19:31:03,382 - cover_agent.UnitTestGenerator - INFO - Total token used count for LLM model gpt-4o: 2422
2024-06-10 19:31:03,385 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:04,992 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 65.12%
2024-06-10 19:31:04,992 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:06,607 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 67.44%
2024-06-10 19:31:06,607 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:08,219 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 69.77%
2024-06-10 19:31:08,220 - cover_agent.UnitTestGenerator - INFO - Running test with the following command: "rye test -- --cov=. --cov-report=xml --cov-report=term"
2024-06-10 19:31:09,828 - cover_agent.UnitTestGenerator - INFO - Test passed and coverage increased. Current coverage: 74.42%
2024-06-10 19:31:09,828 - cover_agent.CoverAgent - INFO - Reached above target coverage of 70% (Current Coverage: 74.42%) in 1 iterations.
python-fastapi

たぶん、一度にテストを追加するとどこまで成功したか不明瞭になるから、パッチデータにして順に流すのは、一般的に使えそうなテクニック。

mizchi

実装を読んで、どういうプロンプトを投げてるか確認

ここにプロンプトがあった

user="""\
## Overview
You are a code assistant that accepts a {{ language }} source file, and a {{ language }} test file.
Your goal is to generate additional unit tests to complement the existing test suite, in order to increase the code coverage against the source file.

Additional guidelines:
- Carefully analyze the provided code. Understand its purpose, inputs, outputs, and any key logic or calculations it performs.
- Brainstorm a list of test cases you think will be necessary to fully validate the correctness of the code and achieve 100% code coverage.
- After each individual test has been added, review all tests to ensure they cover the full range of scenarios, including how to handle exceptions or errors.
- If the original test file contains a test suite, assume that each generated test will be a part of the same suite. Ensure that the new tests are consistent with the existing test suite in terms of style, naming conventions, and structure.

## Source File
Here is the source file that you will be writing tests against, called `{{ source_file_name }}`.
Note that we have manually added line numbers for each line of code, to help you understand the code coverage report.
Those numbers are not a part of the original code.
=========
{{ source_file_numbered|trim }}
=========


## Test File
Here is the file that contains the existing tests, called `{{ test_file_name }}`.
=========
{{ test_file| trim }}
=========


{%- if additional_includes_section|trim %}


{{ additional_includes_section|trim }}
{% endif %}


{%- if failed_tests_section|trim  %}

{{ failed_tests_section|trim }}

{% endif %}

{%- if additional_instructions_text|trim  %}

{{ additional_instructions_text|trim }}
{% endif %}


## Code Coverage
The following is the existing code coverage report. Use this to determine what tests to write, as you should only write tests that increase the overall coverage:
=========
{{ code_coverage_report|trim }}
=========


## Response
The output must be a YAML object equivalent to type $NewTests, according to the following Pydantic definitions:
=====
class SingleTest(BaseModel):
    test_behavior: str = Field(description="Short description of the behavior the test covers")
{%- if language in ["python","java"] %}
    lines_to_cover: str = Field(description="A list of line numbers, currently uncovered, that this specific new test aims to cover")
    test_name: str = Field(description=" A short test name, in snake case, that reflects the behaviour to test")
{%- else %}
    test_name: str = Field(description=" A short unique test name, that should reflect the test objective")
{%- endif %}
    test_code: str = Field(description="A single test function, that tests the behavior described in 'test_behavior'. The test should be a written like its a part of the existing test suite, if there is one, and it can use existing helper functions, setup, or teardown code.")
    new_imports_code: str = Field(description="New imports that are required to run the new test function, and are not already imported in the test file. Give an empty string if no new imports are required. If relevant, add new imports as  'import ...' lines.")
    test_tags: str = Field(description="A single label that best describes the test, out of: ['happy path', 'edge case','other']")

class NewTests(BaseModel):
    language: str = Field(description="The programming language of the source code")
    existing_test_function_signature: str = Field(description="A single line repeating a signature header of one of the existing test functions")
    new_tests: List[SingleTest] = Field(min_items=1, max_items={{ max_tests }}, description="A list of new test functions to append to the existing test suite, aiming to increase the code coverage. Each test should run as-is, without requiring any additional inputs or setup code. Don't introduce new dependencies")
=====

Example output:
```yaml
language: {{ language }}
existing_test_function_signature: |
  ...
new_tests:
- test_behavior: |
    Test that the function returns the correct output for a single element list
{%- if language in ["python","java"] %}
  lines_to_cover: |
    [1,2,5, ...]
  test_name: |
    test_single_element_list
{%- else %}
  test_name: |
    ...
{%- endif %}
  test_code: |
{%- if language in ["python"] %}
    def ...
{%- else %}
    ...
{%- endif %}
  new_imports_code: |
    ""
  test_tags: happy path
    ...
```

Use block scalar('|') to format each YAML output.


{%- if additional_instructions_text|trim  %}

{{ additional_instructions_text|trim }}
{% endif %}


Response (should be a valid YAML, and nothing else):
```yaml
"""

MAX_TESTS_PER_RUN = 4

# Markdown text used as conditional appends
ADDITIONAL_INCLUDES_TEXT = """
## Additional Includes
The following is a set of included files used as context for the source code above. This is usually included libraries needed as context to write better tests:
======
{included_files}
======
"""

ADDITIONAL_INSTRUCTIONS_TEXT = """
## Additional Instructions
======
{additional_instructions}
======
"""

FAILED_TESTS_TEXT = """
## Previous Iterations Failed Tests
Below is a list of failed tests that you generated in previous iterations. Do not generate the same tests again, and take the failed tests into account when generating new tests.
======
{failed_test_runs}
======
"""

mizchi

日本語訳しながら読んでいく。

あなたは、{{ language }} のソースファイルと {{ language }} のテストファイルを受け付けるコードアシスタントです。
あなたの目標は、ソースファイルに対するコードカバレッジを増やすために、既存のテストスイートを補完する追加のユニットテストを生成することです。

追加のガイドライン:

提供されたコードを注意深く分析してください。その目的、入力、出力、および実行する重要なロジックや計算を理解してください。
コードの正確性を完全に検証し、100%のコードカバレッジを達成するために必要と思われるテストケースのリストを考案してください。
個々のテストを追加した後、すべてのテストを見直し、例外やエラーの処理方法を含む全範囲のシナリオをカバーしていることを確認してください。
元のテストファイルにテストスイートが含まれている場合、生成された各テストが同じスイートの一部になると想定してください。新しいテストが既存のテストスイートのスタイル、命名規則、構造に一致することを確認してください。

ソースコードをどう解釈するかの指示。100%カバレッジを目指させてるの面白い。コーディング規約の自動推論も指示している。

以下は、あなたがテストを書く対象となるソースファイル {{ source_file_name }} です。
コードカバレッジレポートを理解するために、手動で各コード行に行番号を追加しています。これらの番号は元のコードの一部ではありません。

テストファイルの指示。
自分も前に処理に悩んだのだが、AIは行カウントやワードカウントが苦手なので、その補助として行データを足している。
とはいえ、これも自分は機能した記憶があんまりない。これで精度は向上するんだろうか？

以下は、既存のテストを含むファイル {{ test_file_name }} です。

テストコードの指示

出力は次のPydantic定義に従って、$NewTests 型と同等のYAMLオブジェクトでなければなりません:

Pydantic の指示

以下は既存のコードカバレッジレポートです。これを使用して書くべきテストを決定してください。全体のカバレッジを増やすテストのみを書くようにしてください。

今回のカバレッジレポート

language: {{ language }}
existing_test_function_signature: |
  ...
new_tests:
- test_behavior: |
    一つの要素のリストに対して関数が正しい出力を返すことをテストする
{%- if language in ["python","java"] %}
  lines_to_cover: |
    [1,2,5, ...]
  test_name: |
    test_single_element_list
{%- else %}
  test_name: |
    ...
{%- endif %}
  test_code: |
{%- if language in ["python"] %}
    def ...
{%- else %}
    ...
{%- endif %}
  new_imports_code: |
    ""
  test_tags: happy path
    ...

出力の Oneshot Prompt

mizchi

感想

大したことはやってないので、自分で同等のものを作れそう。
自分の経験だと、AIに既存コードを書き換えさせるのはかなり難しいのだが、新規に追加するコードとそのテストが通るかを判定するのは、今のLLMの性能に対して適切な落とし所だと思った。

今回はカバレッジという指標を使っているが、例えば mypy の型推論エラーを 0 件になるように書き換えさせる、というのも出来そうではある。が、行の書き換えがやはり難しい気がする。

行データの付与でコードの部分書き換えが機能するか、もうちょっと実験して確かめたい。