😺
画像ファイルに対してGoogle Cloud Visionを適用して、IIIFマニフェストおよびTEI/XMLファイルを作成する
概要
画像ファイルに対してGoogle Cloud Visionを適用して、IIIFマニフェストおよびTEI/XMLファイルを作成するライブラリを作成しました。
本ライブラリの使用方法を説明します。
使用方法
以下で使い方などを確認できます。
ライブラリのインストール
GitHubのリポジトリから、ライブラリをインストールします。
pip install https://github.com/nakamura196/iiif_tei_py
GCのサービスアカウントの作成
以下の記事などを参考に、GC(Google Cloud)のサービスアカウントキー(JSONファイル)をダウンロードします。
そして、以下のような.env
ファイルを作成します。
.env
GOOGLE_APPLICATION_CREDENTIALS=your-google-credentials.json
実行
入力サンプル画像として、IIIF Cookbookでも使用されている以下の画像を使用します。
https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png
以下のようなファイルを作成して実行します。
main.py
from iiif_tei_py.core import CoreClient
cred_path = CoreClient.load_env()
url = "https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png"
output_tei_xml_file_path = "./tmp/01/output.xml"
CoreClient.create_tei_xml_with_gocr(url, output_tei_xml_file_path, cred_path, title="Sample")
上記の例では、IIIFマニフェストファイルが./tmp/01/output.json
に、TEI/XMLファイルが./tmp/01/output.xml
に作成されます。
結果の確認
IIIF
IIIFマニフェストファイルをMiradorで表示した例が以下です。
JSONファイルの内容は以下です。
./tmp/01/output.json
{
"@context": "http://iiif.io/api/presentation/3/context.json",
"id": "http://example.org/iiif/abc/manifest",
"label": {
"none": [
"Sample"
]
},
"type": "Manifest",
"items": [
{
"id": "http://example.org/iiif/abc/canvas/p1",
"type": "Canvas",
"label": {
"none": [
"[1]"
]
},
"height": 1800,
"width": 1200,
"items": [
{
"id": "http://example.org/iiif/abc/annotation/p0001-image",
"type": "AnnotationPage",
"items": [
{
"body": {
"id": "https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png",
"type": "Image",
"format": "image/jpeg",
"height": 1800,
"width": 1200
},
"id": "http://example.org/iiif/abc/annotation/p0001-image/anno",
"type": "Annotation",
"motivation": "painting",
"target": "http://example.org/iiif/abc/canvas/p1"
}
]
}
],
"annotations": [
{
"id": "http://example.org/iiif/abc/canvas/p1/curation",
"type": "AnnotationPage",
"items": [
{
"body": {
"type": "TextualBody",
"value": "[00001] Top",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=245/69/94/52",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=245,69,94,52"
},
{
"body": {
"type": "TextualBody",
"value": "[00002] of",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=355/69/49/52",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=355,69,49,52"
},
{
"body": {
"type": "TextualBody",
"value": "[00003] First",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=420/69/112/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=420,69,112,54"
},
{
"body": {
"type": "TextualBody",
"value": "[00004] Page",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=547/70/134/53",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=547,70,134,53"
},
{
"body": {
"type": "TextualBody",
"value": "[00005] to",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=697/71/50/52",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=697,71,50,52"
},
{
"body": {
"type": "TextualBody",
"value": "[00006] Display",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=763/71/189/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=763,71,189,54"
},
{
"body": {
"type": "TextualBody",
"value": "[00007] Middle",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=296/593/163/164",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=296,593,163,164"
},
{
"body": {
"type": "TextualBody",
"value": "[00008] of",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=433/733/76/76",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=433,733,76,76"
},
{
"body": {
"type": "TextualBody",
"value": "[00009] First",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=484/786/123/124",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=484,786,123,124"
},
{
"body": {
"type": "TextualBody",
"value": "[00010] Page",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=584/889/128/129",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=584,889,128,129"
},
{
"body": {
"type": "TextualBody",
"value": "[00011] on",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=691/998/80/80",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=691,998,80,80"
},
{
"body": {
"type": "TextualBody",
"value": "[00012] Angle",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=749/1057/148/149",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=749,1057,148,149"
},
{
"body": {
"type": "TextualBody",
"value": "[00013] Bottom",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=203/1686/175/55",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=203,1686,175,55"
},
{
"body": {
"type": "TextualBody",
"value": "[00014] of",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=398/1689/51/53",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=398,1689,51,53"
},
{
"body": {
"type": "TextualBody",
"value": "[00015] First",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=466/1689/109/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=466,1689,109,54"
},
{
"body": {
"type": "TextualBody",
"value": "[00016] Page",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=593/1690/130/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=593,1690,130,54"
},
{
"body": {
"type": "TextualBody",
"value": "[00017] to",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=740/1692/51/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=740,1692,51,54"
},
{
"body": {
"type": "TextualBody",
"value": "[00018] Display",
"format": "text/plain"
},
"id": "http://example.org/iiif/abc/canvas/p1#xywh=808/1693/190/54",
"type": "Annotation",
"motivation": "commenting",
"target": "http://example.org/iiif/abc/canvas/p1#xywh=808,1693,190,54"
}
]
}
]
}
]
}
TEI
また、TEI/XMLファイルをOxygen XML Editorで表示した例が以下です。
XMLファイルの内容は以下です。
./tmp/01/output.xml
<?xml version="1.0" ?>
<?xml-model href="https://tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Sample</title>
</titleStmt>
<publicationStmt>
<p>Example</p>
</publicationStmt>
<sourceDesc>
<p>Example</p>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<listChange>
<change/>
</listChange>
</creation>
</profileDesc>
</teiHeader>
<facsimile>
<surface>
<graphic url="https://iiif.io/api/presentation/2.1/example/fixtures/resources/page1-full.png"/>
<zone lrx="339" lry="121" ulx="245" uly="69" xml:id="a_0000">
<seg>Top</seg>
</zone>
<zone lrx="404" lry="121" ulx="355" uly="69" xml:id="a_0001">
<seg>of</seg>
</zone>
<zone lrx="532" lry="123" ulx="420" uly="69" xml:id="a_0002">
<seg>First</seg>
</zone>
<zone lrx="681" lry="123" ulx="547" uly="70" xml:id="a_0003">
<seg>Page</seg>
</zone>
<zone lrx="747" lry="123" ulx="697" uly="71" xml:id="a_0004">
<seg>to</seg>
</zone>
<zone lrx="952" lry="125" ulx="763" uly="71" xml:id="a_0005">
<seg>Display</seg>
</zone>
<zone lrx="459" lry="757" ulx="296" uly="593" xml:id="a_0006">
<seg>Middle</seg>
</zone>
<zone lrx="509" lry="809" ulx="433" uly="733" xml:id="a_0007">
<seg>of</seg>
</zone>
<zone lrx="607" lry="910" ulx="484" uly="786" xml:id="a_0008">
<seg>First</seg>
</zone>
<zone lrx="712" lry="1018" ulx="584" uly="889" xml:id="a_0009">
<seg>Page</seg>
</zone>
<zone lrx="771" lry="1078" ulx="691" uly="998" xml:id="a_0010">
<seg>on</seg>
</zone>
<zone lrx="897" lry="1206" ulx="749" uly="1057" xml:id="a_0011">
<seg>Angle</seg>
</zone>
<zone lrx="378" lry="1741" ulx="203" uly="1686" xml:id="a_0012">
<seg>Bottom</seg>
</zone>
<zone lrx="449" lry="1742" ulx="398" uly="1689" xml:id="a_0013">
<seg>of</seg>
</zone>
<zone lrx="575" lry="1743" ulx="466" uly="1689" xml:id="a_0014">
<seg>First</seg>
</zone>
<zone lrx="723" lry="1744" ulx="593" uly="1690" xml:id="a_0015">
<seg>Page</seg>
</zone>
<zone lrx="791" lry="1746" ulx="740" uly="1692" xml:id="a_0016">
<seg>to</seg>
</zone>
<zone lrx="998" lry="1747" ulx="808" uly="1693" xml:id="a_0017">
<seg>Display</seg>
</zone>
</surface>
</facsimile>
</TEI>
まとめ
Google Cloud Visionを用いた校正前テキストの作成といった用途において、参考になりましたら幸いです。
Discussion