iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🍤

Visual Regression Testing with Ebitengine

に公開

This is a note I made while considering how to properly test screen rendering operations when creating a game using Ebitengine (formerly Ebiten), a game engine for Go ✍

https://ebitengine.org/ja/

Note that since I am developing on Ubuntu 22.04, there might be different pitfalls in other environments like Windows or macOS.
Please bear with me 🦵

What I Investigated

Visual Regression Testing is a type of automated testing where you save the output results before a code change (such as the DOM structure for web apps or PDFs for report outputs) in a repository and compare them with the output from the implementation after the change to ensure there are no differences.

https://zenn.dev/roki_na/articles/6e17079d91f82f

To implement this test, you need to generate a PNG image from ebiten.Image, which represents an image or the screen to be displayed in Ebitengine. However, if you simply execute the process from a unit test case, a panic: buffered: the command queue is not available yet at ~ error occurs when accessing the pixel columns within the Image.

As mentioned in the issue below, the solution seems to be calling ebiten.RunGame() within TestMain() and performing the processing inside it.

https://github.com/hajimehoshi/ebiten/issues/1264

Also, since Ebitengine currently lacks a headless mode, just following the above consideration will cause a fatal error: X11/Xcursor/Xcursor.h: No such file or directory ~ error in CI environments like GitHub Actions.

https://github.com/hajimehoshi/ebiten/issues/353

For this, it seems good to start Xvfb, a virtual display that allows you to run applications headlessly on CI.
The following link provides a clear explanation of how to use Xvfb:

https://blog.amedama.jp/entry/2016/01/03/115602

Test Implementation

Below is an implementation image based on the considerations above.

First, prepare a Game class for testing.

test/game.go
package test

import (
	"errors"
	"github.com/hajimehoshi/ebiten/v2"
	"os"
	"testing"
)

var regularTermination = errors.New("regular termination")

type game struct {
	m    *testing.M
	code int
}

func (g *game) Update() error {
	g.code = g.m.Run()
	return regularTermination
}

func (*game) Draw(*ebiten.Image) {
}

func (g *game) Layout(int, int) (int, int) {
	return 1, 1
}

func RunTestGame(m *testing.M) {
	ebiten.SetWindowSize(128, 72)
	ebiten.SetInitFocused(false)
	ebiten.SetWindowTitle("Testing...")

	g := &game{
		m: m,
	}
	if err := ebiten.RunGame(g); err != nil && err != regularTermination {
		panic(err)
	}
	os.Exit(g.code)
}

Next, define a test function that takes an ebiten.Image, creates a snapshot PNG file, and checks for differences.

Regarding the generation of difference images, someone created image-diff, which is perfect for this use case, so I used it here.

test/snapshot.go
package test

import (
	"errors"
	"fmt"
	"github.com/hajimehoshi/ebiten/v2"
	diff "github.com/olegfedoseev/image-diff"
	"image"
	"image/png"
	"log"
	"os"
	"path"
	"runtime"
	"strconv"
	"strings"
	"testing"
)

const (
	SnapshotErrorThreshold = 0.0
)

func CheckSnapshot(t *testing.T, actualImage *ebiten.Image) error {
	_, callerSourceFileName, _, ok := runtime.Caller(1)
	if !ok {
		log.Fatalf("failed to read filename: %v", t.Name())
	}

	basePath := path.Join(path.Dir(callerSourceFileName), "snapshot")
	baseFileName := strings.ReplaceAll(t.Name(), "/", "_")
	expectedFilePath := path.Join(basePath, fmt.Sprintf("%v.png", baseFileName))
	actualFilePath := path.Join(basePath, fmt.Sprintf("%v_actual.png", baseFileName))
	diffFilePath := path.Join(basePath, fmt.Sprintf("%v_diff.png", baseFileName))

	err := os.MkdirAll(basePath, os.ModePerm)
	if err != nil {
		log.Fatal(err)
	}

	var expectedImage image.Image
	foundExpectedImage := false
	expectedFile, err := os.Open(expectedFilePath)
	if err == nil {
		expectedImage, _, err = image.Decode(expectedFile)
		if err != nil {
			log.Fatal(err)
		}
		foundExpectedImage = true
	} else if !errors.Is(err, os.ErrNotExist) {
		log.Fatal(err)
	}

	_ = os.Remove(diffFilePath)
	_ = os.Remove(actualFilePath)

	updateSnapshot, _ := strconv.ParseBool(os.Getenv("UPDATE_SNAPSHOT"))
	if foundExpectedImage && !updateSnapshot {
		diffImage, percent, err := diff.CompareImages(actualImage, expectedImage)
		if err != nil {
			log.Fatal(err)
		}

		if percent > SnapshotErrorThreshold {
			f, _ := os.Create(diffFilePath)
			defer func(f *os.File) {
				err := f.Close()
				if err != nil {
					log.Fatal(err)
				}
			}(f)

			err = png.Encode(f, diffImage)
			if err != nil {
				log.Fatal(err)
			}

			f, _ = os.Create(actualFilePath)
			defer func(f *os.File) {
				err := f.Close()
				if err != nil {
					log.Fatal(err)
				}
			}(f)

			err = png.Encode(f, actualImage)
			if err != nil {
				log.Fatal(err)
			}

			return fmt.Errorf(
				"snapshot test failed: diff = %v > %v, file = %v",
				percent,
				SnapshotErrorThreshold,
				diffFilePath)
		}
	}

	f, _ := os.Create(expectedFilePath)
	defer func(f *os.File) {
		err := f.Close()
		if err != nil {
			log.Fatal(err)
		}
	}(f)

	err = png.Encode(f, actualImage)
	if err != nil {
		log.Fatal(err)
	}

	return nil
}

https://github.com/olegfedoseev/image-diff

The test function looks something like this:
As a point to note, tests that execute RunGame currently cannot avoid having the display appear for a split second. Additionally, since the test behavior itself might be unstable depending on the environment, I've added a termtests tag so that it can be run optionally.

https://devlights.hatenablog.com/entry/2020/09/24/011502

To run it, you can do the following:

go test -tags=termtests -v ./...
example_test.go
//go:build termtests
// +build termtests

package test_test

import (
	"fmt"
	"github.com/hajimehoshi/ebiten/v2"
	"github.com/hajimehoshi/ebiten/v2/ebitenutil"
	"image/color"
	"log"
	"testing"
	// Change this to match your package
	"github.com/org/repo/test"
)

func TestExample_PrintMessage(t *testing.T) {
	const (
		Width  = 128
		Height = 72
	)

	tests := []struct {
		text string
	}{
		{text: ""},
		{text: "TestABC"},
	}

	for i, tt := range tests {
		t.Run(fmt.Sprintf("text_%v", i), func(t *testing.T) {
			// Create test image
			image := ebiten.NewImage(Width, Height)
			image.Fill(color.Black)

			// Call the process to be tested
			PrintMessage(image, tt.text)

			// Check result
			err := test.CheckSnapshot(t, image)
			if err != nil {
				t.Error(err)
			}

		})
	}
}

func TestMain(m *testing.M) {
	test.RunTestGame(m)
}

func PrintMessage(image *ebiten.Image, str string) {
	ebitenutil.DebugPrint(image, str)
}

When you run the above test, a directory named snapshot/ is created in the same directory as the test code, and the image files output from ebiten.Image after the test execution are stored there.

$ ls -l snapshot/
total 24
-rw-rw-r-- 1 tkhs tkhs 229 Oct  3 15:25 TestExample_PrintMessage_text_0.png
-rw-rw-r-- 1 tkhs tkhs 330 Oct  3 15:25 TestExample_PrintMessage_text_1.png

The content looks like this:


TestExample_PrintMessage_text_1.png

After generating the images, if you modify the function content and run the test again, an error will occur because the content of the actual image differs from the expected image.
For example, let's say you change the function as follows:

func PrintMessage(image *ebiten.Image, str string) {
	ebitenutil.DebugPrint(image, "Hello, "+str)
}

The test fails:

=== RUN   TestExample_PrintMessage/text_0
    example_test.go:40: snapshot test failed: diff = 0.8572048611111112 > 0, file = /your-path/test/snapshot/TestExample_PrintMessage_text_0_diff.png
=== RUN   TestExample_PrintMessage/text_1
    example_test.go:40: snapshot test failed: diff = 2.528211805555556 > 0, file = /your-path/test/snapshot/TestExample_PrintMessage_text_1_diff.png
--- FAIL: TestExample_PrintMessage (0.02s)
    --- FAIL: TestExample_PrintMessage/text_0 (0.01s)
    --- FAIL: TestExample_PrintMessage/text_1 (0.01s)

As a result, new images named *.actual.png and *.diff.png are output under the snapshot/ directory.
The contents look something like this:


TestExample_PrintMessage_text_1_actual.png


TestExample_PrintMessage_text_1_diff.png

Since these images are for visual verification and should not be committed to the repository, it's a good idea to ignore them.

.gitignore
**/snapshot/*_actual.png
**/snapshot/*_diff.png

Once you confirm that the difference is as expected (in this case, it seems fine since we added Hello, to the beginning of the string), run the test again with a truthy value set in the UPDATE_SNAPSHOT environment variable, such as UPDATE_SNAPSHOT=1 go test ~.
The expected images will be updated, so you can commit them to the repository to complete the test.

In the future, an error will occur if you change the behavior while modifying unrelated parts, making it easier to perform refactoring and other tasks 🍮

Running on GitHub Actions

When running tests on GitHub Actions, I was able to achieve it using the method introduced below:

https://stackoverflow.com/questions/63125480/running-a-gui-application-on-a-ci-service-without-x11

I believe the configuration would look something like this:

check.yml
name: Check

on: push

jobs:
  test:
    timeout-minutes: 5
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-go@v3
        with:
          go-version: 1.19

      - run: |
          # https://ebitengine.org/ja/documents/install.html#Debian_/_Ubuntu
          sudo apt install -y libc6-dev libglu1-mesa-dev libgl1-mesa-dev libxcursor-dev libxi-dev libxinerama-dev libxrandr-dev libxxf86vm-dev libasound2-dev pkg-config

      - run: |
          # https://stackoverflow.com/questions/63125480/running-a-gui-application-on-a-ci-service-without-x11
          export DISPLAY=:99
          sudo Xvfb -ac :99 -screen 0 1280x1024x24 > /dev/null 2>&1 &
          go test -tags=termtests -v ./...

What I learned from trying this is that it can be quite a lot of work to do the "modify → launch → visual check" cycle, especially for something like a single component of a game. Being able to easily notice when existing behavior is broken provides a sense of security.

Also, this is a benefit of Test-Driven Development (TDD), but the cycle of "tweak implementation → test breaks → fix it" feels like a game itself and is fun, so I recommend it also because it makes you feel like it's okay even if the game itself never gets finished 😊

Discussion