⚙️

[Python] Tesseractでゲーム画面から情報を読み取る

2021/02/08に公開

Python

OCR

tesseract

tech

はじめに

Pythonの学習を兼ねて、OCRに挑戦します。
Pythonは既にインストール済み前提とさせていただきます。

実行環境

Windows 8.1
Python 3.9.1
Tesseract 5.0.0(alpha)
PyOCR 0.8

Tesseractのセットアップ

こちらからコンパイル済みのインストーラをダウンロード後、
画面の指示に従ってインストールを進めます。

PyOCR

PyOCRのセットアップ

pip install pyocr

PyOCRの基本的な使い方

超簡単にまとめると以下の流れになります

tools = pyocr.get_available_tools()
text = tools[0].image_to_string(image, lang="jpn", builder=builder)

ゲーム画面の読み取り

実際にゲームの倉庫画面を読み取ってみます。
対象はこちら

某ゲームの倉庫画面

ocr.py

import os
import sys
import pyocr, pyocr.builders
from PIL import Image

def setup_path():
    path = "C:\\Program Files\\Tesseract-OCR" # Tesseractのパス
    path_list = os.environ["PATH"].split(os.pathsep)
    if path not in path_list:
        os.environ["PATH"] += os.pathsep + path

def check_stash():
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print("No OCR tool")
        sys.exit(1)
    text = tools[0].image_to_string(
        Image.open("img/debug/_test_stash_1.png"), # 画像のパス
        lang="jpn",
        builder=pyocr.builders.TextBuilder(tesseract_layout=6))
    print(text)

if __name__ == "__main__":
    setup_path()
    check_stash()

こちらを実行すると、以下の結果が出力されました。

4        なあ            記 全部     で 消耗品 | MV 貴重品    素材
NE       に    人 /            /                          ュー/   Wa E/
ド/           1061             にKi0)             1                                         ua)
   に<)
/       m
NN        KN 1      べ         )
        が
2る ) (細            ジレ
ョ1295          ュ             19           566                                                     7 1
      NN     ングの      条K <     LN     NN     / だ)へ                 /)  WW
oc 、計2S    (人 (給(2う       N
-壮     ss 証          グ リックマ/   \          US い\い  か  0 / -ず
|    Ni5    1 、   0)       <  をい     Nc 引 |   N     3     Ne マ MM                    4<12)

全然駄目ですね。
右上の漢字は認識されていますが、余計なものが多数あります。

画像を認識しやすいように加工する

アイコンや素材の絵自体が文字と認識されています。これを認識しやすいように加工していきます。

どう改善すると良いかは公式ドキュメントを参考にしました。

各アイテムごとに区切る

アイテムが等間隔に並んでいるので、ループ処理で切り分けます

def get_items():
    stash = Image.open("img/debug/_test_stash_1.png")
    item_size = 180
    start_x, start_y = 60, 140
    gap = 15
    items = []
    x, y = start_x, start_y
    while x < stash.width:
        items.append(stash.crop((x, y, x + item_size, y + item_size)))
        x += item_size + gap
        # TODO とりあえず横一列まで
    return items

色を調整する

While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background.

白背景に黒文字にする必要がありそうです。
数字部分以外を白、数字部分を黒にしてみます。

def optimize(image):
    border = 220
    arr = np.array(image)
    for i in range(len(arr)):
        for j in range(len(arr[i])):
            pix = arr[i][j]
            if i < 130 or j < 90 or i > 150 or j > 150: # 数字以外の座標
                arr[i][j] = [255, 255, 255, 255]
            elif pix[0] < border or pix[1] < border or pix[2] < border: # 暗めの色は白に
                arr[i][j] = [255, 255, 255, 255]
            elif pix[0] >= border or pix[1] >= border or pix[2] >= border: # 白文字は黒に
                arr[i][j] = [0, 0, 0, 255]
    return Image.fromarray(arr)

変換結果

結果

最後に変換後画像を判定します。

def recognize(image):
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print("No OCR tool")
        sys.exit(1)
    return tools[0].image_to_string(
        image,
        lang="jpn",
        builder=pyocr.builders.DigitBuilder(tesseract_layout=10))

実行結果
ヨシ！(๑•̀ㅂ•́)و✧

補足

tesseract_layout とは？

こちらのマニュアルの、--psm Nが該当する箇所です

--psm N

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
11 = Sparse text. Find as much text as possible in no particular order.
12 = Sparse text with OSD.
13 = Raw line. Treat the image as a single text line,
     bypassing hacks that are Tesseract-specific.

今回は１単語想定なので１０を使用しました。