🚀

校異源氏物語テキストDBで公開するTEI/XMLファイルに対するDTS APIを作成する

2024/09/04に公開

概要

校異源氏物語テキストDBで公開するTEI/XMLファイルに対するDTS(Distributed Text Services) APIを作成したので、備忘録です。

背景

校異源氏物語テキストDBは以下です。

https://kouigenjimonogatari.github.io/

TEI/XMLファイルを公開しています。

開発したDTS

開発したDTSは以下です。

https://dts-typescript.vercel.app/api/dts

Express.jsをVercelに設置しています。

DTSは以下を参考にしてください。

https://zenn.dev/nakamura196/articles/4233fe80b3e76d

MyCapytainライブラリ

以下の記事で、DTSをPythonから利用するライブラリを紹介しました。

https://zenn.dev/nakamura196/articles/1f52f460025274

本ライブラリを使用して、開発したDTSを利用してみます。

Create the resolver

With the following line we create the resolver :

from MyCapytain.resolvers.dts.api_v1 import HttpDtsResolver

resolver = HttpDtsResolver("https://dts-typescript.vercel.app/api/dts")

Require metadata : let's visit the catalog

The following code is gonna find each text that is readable by Alpheios

# We get the root collection
root = resolver.getMetadata()
# Then we retrieve dynamically all the readableDescendants : it browse automatically the API until
# it does not have seen any missing texts: be careful with this one on huge repositories
readable_collections = root.readableDescendants
print("We found %s collections that can be parsed" % len(readable_collections))
We found 54 collections that can be parsed

Printing the full tree

# Note that we could also see and make a tree of the catalog.
# If you are not familiar with recursivity, the next lines might be a bit complicated
def show_tree(collection, char_number=1):
    for subcollection_id, subcollection in collection.children.items():
        print(char_number*"--" + " " + subcollection.id)
        show_tree(subcollection, char_number+1)

print(root.id)
show_tree(root)
default
-- urn:kouigenjimonogatari
---- urn:kouigenjimonogatari.1
---- urn:kouigenjimonogatari.2
---- urn:kouigenjimonogatari.3
---- urn:kouigenjimonogatari.4
---- urn:kouigenjimonogatari.5
---- urn:kouigenjimonogatari.6
---- urn:kouigenjimonogatari.7
---- urn:kouigenjimonogatari.8
---- urn:kouigenjimonogatari.9
---- urn:kouigenjimonogatari.10
---- urn:kouigenjimonogatari.11
---- urn:kouigenjimonogatari.12
---- urn:kouigenjimonogatari.13
---- urn:kouigenjimonogatari.14
---- urn:kouigenjimonogatari.15
---- urn:kouigenjimonogatari.16
---- urn:kouigenjimonogatari.17
---- urn:kouigenjimonogatari.18
---- urn:kouigenjimonogatari.19
---- urn:kouigenjimonogatari.20
---- urn:kouigenjimonogatari.21
---- urn:kouigenjimonogatari.22
---- urn:kouigenjimonogatari.23
---- urn:kouigenjimonogatari.24
---- urn:kouigenjimonogatari.25
---- urn:kouigenjimonogatari.26
---- urn:kouigenjimonogatari.27
---- urn:kouigenjimonogatari.28
---- urn:kouigenjimonogatari.29
---- urn:kouigenjimonogatari.30
---- urn:kouigenjimonogatari.31
---- urn:kouigenjimonogatari.32
---- urn:kouigenjimonogatari.33
---- urn:kouigenjimonogatari.34
---- urn:kouigenjimonogatari.35
---- urn:kouigenjimonogatari.36
---- urn:kouigenjimonogatari.37
---- urn:kouigenjimonogatari.38
---- urn:kouigenjimonogatari.39
---- urn:kouigenjimonogatari.40
---- urn:kouigenjimonogatari.41
---- urn:kouigenjimonogatari.42
---- urn:kouigenjimonogatari.43
---- urn:kouigenjimonogatari.44
---- urn:kouigenjimonogatari.45
---- urn:kouigenjimonogatari.46
---- urn:kouigenjimonogatari.47
---- urn:kouigenjimonogatari.48
---- urn:kouigenjimonogatari.49
---- urn:kouigenjimonogatari.50
---- urn:kouigenjimonogatari.51
---- urn:kouigenjimonogatari.52
---- urn:kouigenjimonogatari.53
---- urn:kouigenjimonogatari.54

Printing details about a specific one

# Let's get a random one !
from random import randint
# The index needs to be between 0 and the number of collections
rand_index = randint(0, len(readable_collections))
collection = readable_collections[rand_index]

# Now let's print information ?
label = collection.get_label()

text_id = collection.id
print("Treaing `"+label+"` with id " + text_id)
Treaing `総角` with id urn:kouigenjimonogatari.47

What about more detailed informations ? Like the citation scheme ?

def recursive_printing_citation_scheme(citation, char_number=1):
    for subcitation in citation.children:
        print(char_number*"--" + " " + subcitation.name)
        recursive_printing_citation_scheme(subcitation, char_number+1)

print("Maximum citation depth : ", collection.citation.depth)
print("Citation System")
recursive_printing_citation_scheme(collection.citation)
Maximum citation depth :  1
Citation System
-- line

Let's get some references !

reffs = resolver.getReffs(collection.id)
print(reffs)
# Nice !
<DtsReferenceSet (<DtsReference <https://w3id.org/kouigenjimonogatari/api/items/1587-01.json> [line]>, <DtsReference <https://w3id.org/kouigenjimonogatari/api/items/1587-02.json> [line]>, <DtsReference <https://w3id.org/kouigenjimonogatari/api/items/1587-03.json> [line]>, <DtsReference <https://w3id.org/kouigenjimonogatari/api/items/1587-04.json> [line]>, <DtsReference ...

Let's get some random passage !

# Let's get a random one !
from random import randint
# The index needs to be between 0 and the number of collections
rand_index = randint(0, len(reffs)-1)
reff = reffs[rand_index]

passage = resolver.getTextualNode(collection.id, reff)
print(passage.id, passage.reference)

# Let's see the XML here
# For that, we need to get the mimetype right :
from MyCapytain.common.constants import Mimetypes
print(passage.export(Mimetypes.XML.TEI))
urn:kouigenjimonogatari.47 <DtsReference <https://w3id.org/kouigenjimonogatari/api/items/1640-06.json> [line]>
<TEI xmlns="http://www.tei-c.org/ns/1.0"><dts:fragment xmlns:dts="https://w3id.org/dts/api#"><text><body><p>
...

考察

上記の通り、MyCapytainライブラリの基本操作に対応したDTSを構築することができました。

上記ではPythonから利用しましたが、例えば以下のように、ブラウザからも利用できます。以下は桐壺の1行目を取得する例です。

https://dts-typescript.vercel.app/api/dts/document?id=urn:kouigenjimonogatari.1&ref=https://w3id.org/kouigenjimonogatari/api/items/0005-01.json

注意

今回は以下のAPIを参考に開発しました。

https://texts.alpheios.net/api/dts

ただし、上記APIは以下の最新のガイドラインに対応しているのかは未確認です。

https://distributed-text-services.github.io/specifications/

そのため、今回開発したDTS APIも上記のガイドラインに非対応の箇所がある可能性がある点にご注意ください。

まとめ

DTSの理解にあたり、参考になりましたら幸いです。

Discussion