😺

画像ファイルに対してGoogle Cloud Visionを適用して、IIIFマニフェストおよびTEI/XMLファイルを作成する

2024/08/08に公開

概要

画像ファイルに対してGoogle Cloud Visionを適用して、IIIFマニフェストおよびTEI/XMLファイルを作成するライブラリを作成しました。

https://github.com/nakamura196/iiif_tei_py

本ライブラリの使用方法を説明します。

使用方法

以下で使い方などを確認できます。

https://nakamura196.github.io/iiif_tei_py/

ライブラリのインストール

GitHubのリポジトリから、ライブラリをインストールします。

pip install https://github.com/nakamura196/iiif_tei_py

GCのサービスアカウントの作成

以下の記事などを参考に、GC(Google Cloud)のサービスアカウントキー(JSONファイル)をダウンロードします。

https://book.st-hakky.com/data-science/data-science-gcp-vision-api-setting/

そして、以下のような.envファイルを作成します。

.env
GOOGLE_APPLICATION_CREDENTIALS=your-google-credentials.json

実行

入力サンプル画像として、IIIF Cookbookでも使用されている以下の画像を使用します。

https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png

以下のようなファイルを作成して実行します。

main.py
from iiif_tei_py.core import CoreClient
cred_path = CoreClient.load_env()

url = "https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png"
output_tei_xml_file_path = "./tmp/01/output.xml"
CoreClient.create_tei_xml_with_gocr(url, output_tei_xml_file_path, cred_path, title="Sample")

上記の例では、IIIFマニフェストファイルが./tmp/01/output.jsonに、TEI/XMLファイルが./tmp/01/output.xmlに作成されます。

結果の確認

IIIF

IIIFマニフェストファイルをMiradorで表示した例が以下です。

JSONファイルの内容は以下です。

./tmp/01/output.json
{
    "@context": "http://iiif.io/api/presentation/3/context.json",
    "id": "http://example.org/iiif/abc/manifest",
    "label": {
        "none": [
            "Sample"
        ]
    },
    "type": "Manifest",
    "items": [
        {
            "id": "http://example.org/iiif/abc/canvas/p1",
            "type": "Canvas",
            "label": {
                "none": [
                    "[1]"
                ]
            },
            "height": 1800,
            "width": 1200,
            "items": [
                {
                    "id": "http://example.org/iiif/abc/annotation/p0001-image",
                    "type": "AnnotationPage",
                    "items": [
                        {
                            "body": {
                                "id": "https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png",
                                "type": "Image",
                                "format": "image/jpeg",
                                "height": 1800,
                                "width": 1200
                            },
                            "id": "http://example.org/iiif/abc/annotation/p0001-image/anno",
                            "type": "Annotation",
                            "motivation": "painting",
                            "target": "http://example.org/iiif/abc/canvas/p1"
                        }
                    ]
                }
            ],
            "annotations": [
                {
                    "id": "http://example.org/iiif/abc/canvas/p1/curation",
                    "type": "AnnotationPage",
                    "items": [
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00001] Top",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=245/69/94/52",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=245,69,94,52"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00002] of",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=355/69/49/52",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=355,69,49,52"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00003] First",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=420/69/112/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=420,69,112,54"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00004] Page",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=547/70/134/53",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=547,70,134,53"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00005] to",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=697/71/50/52",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=697,71,50,52"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00006] Display",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=763/71/189/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=763,71,189,54"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00007] Middle",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=296/593/163/164",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=296,593,163,164"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00008] of",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=433/733/76/76",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=433,733,76,76"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00009] First",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=484/786/123/124",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=484,786,123,124"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00010] Page",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=584/889/128/129",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=584,889,128,129"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00011] on",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=691/998/80/80",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=691,998,80,80"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00012] Angle",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=749/1057/148/149",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=749,1057,148,149"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00013] Bottom",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=203/1686/175/55",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=203,1686,175,55"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00014] of",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=398/1689/51/53",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=398,1689,51,53"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00015] First",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=466/1689/109/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=466,1689,109,54"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00016] Page",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=593/1690/130/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=593,1690,130,54"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00017] to",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=740/1692/51/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=740,1692,51,54"
                        },
                        {
                            "body": {
                                "type": "TextualBody",
                                "value": "[00018] Display",
                                "format": "text/plain"
                            },
                            "id": "http://example.org/iiif/abc/canvas/p1#xywh=808/1693/190/54",
                            "type": "Annotation",
                            "motivation": "commenting",
                            "target": "http://example.org/iiif/abc/canvas/p1#xywh=808,1693,190,54"
                        }
                    ]
                }
            ]
        }
    ]
}

TEI

また、TEI/XMLファイルをOxygen XML Editorで表示した例が以下です。

XMLファイルの内容は以下です。

./tmp/01/output.xml
<?xml version="1.0" ?>
<?xml-model href="https://tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt>
				<title>Sample</title>
			</titleStmt>
			<publicationStmt>
				<p>Example</p>
			</publicationStmt>
			<sourceDesc>
				<p>Example</p>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<creation>
				<listChange>
					<change/>
				</listChange>
			</creation>
		</profileDesc>
	</teiHeader>
	<facsimile>
		<surface>
			<graphic url="https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png"/>
			<zone lrx="339" lry="121" ulx="245" uly="69" xml:id="a_0000">
				<seg>Top</seg>
			</zone>
			<zone lrx="404" lry="121" ulx="355" uly="69" xml:id="a_0001">
				<seg>of</seg>
			</zone>
			<zone lrx="532" lry="123" ulx="420" uly="69" xml:id="a_0002">
				<seg>First</seg>
			</zone>
			<zone lrx="681" lry="123" ulx="547" uly="70" xml:id="a_0003">
				<seg>Page</seg>
			</zone>
			<zone lrx="747" lry="123" ulx="697" uly="71" xml:id="a_0004">
				<seg>to</seg>
			</zone>
			<zone lrx="952" lry="125" ulx="763" uly="71" xml:id="a_0005">
				<seg>Display</seg>
			</zone>
			<zone lrx="459" lry="757" ulx="296" uly="593" xml:id="a_0006">
				<seg>Middle</seg>
			</zone>
			<zone lrx="509" lry="809" ulx="433" uly="733" xml:id="a_0007">
				<seg>of</seg>
			</zone>
			<zone lrx="607" lry="910" ulx="484" uly="786" xml:id="a_0008">
				<seg>First</seg>
			</zone>
			<zone lrx="712" lry="1018" ulx="584" uly="889" xml:id="a_0009">
				<seg>Page</seg>
			</zone>
			<zone lrx="771" lry="1078" ulx="691" uly="998" xml:id="a_0010">
				<seg>on</seg>
			</zone>
			<zone lrx="897" lry="1206" ulx="749" uly="1057" xml:id="a_0011">
				<seg>Angle</seg>
			</zone>
			<zone lrx="378" lry="1741" ulx="203" uly="1686" xml:id="a_0012">
				<seg>Bottom</seg>
			</zone>
			<zone lrx="449" lry="1742" ulx="398" uly="1689" xml:id="a_0013">
				<seg>of</seg>
			</zone>
			<zone lrx="575" lry="1743" ulx="466" uly="1689" xml:id="a_0014">
				<seg>First</seg>
			</zone>
			<zone lrx="723" lry="1744" ulx="593" uly="1690" xml:id="a_0015">
				<seg>Page</seg>
			</zone>
			<zone lrx="791" lry="1746" ulx="740" uly="1692" xml:id="a_0016">
				<seg>to</seg>
			</zone>
			<zone lrx="998" lry="1747" ulx="808" uly="1693" xml:id="a_0017">
				<seg>Display</seg>
			</zone>
		</surface>
	</facsimile>
</TEI>

まとめ

Google Cloud Visionを用いた校正前テキストの作成といった用途において、参考になりましたら幸いです。

Discussion