【Azure Document intelligence】- Layoutモデルについて
執筆日
2024/11/25
やりたいこと
Azure Document IntelligenceのReadモデルを使用してOCRを行っていましたが、チェックボックスの検知ができず、どう対処しようか悩んでました。(カスタムモデルはちょっと高いし..)
Portal画面を見ていると、Layoutモデルを発見しました。このモデルを使用すると、チェックボックスの検知が可能であるという記載がありました。そこで、実際に検証してみることにしました。
前提条件
- Azure Document IntelligenceのS0を構築済みであること
サンプルコードを実行する
以下のサンプルコードを使用して、「Layout」モデルを試してみます。
sample code
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
key = "YOUR_FORM_RECOGNIZER_KEY"
# sample document
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url("prebuilt-layout", formUrl)
result = poller.result()
for idx, style in enumerate(result.styles):
print(
"Document contains {} content".format(
"handwritten" if style.is_handwritten else "no handwritten"
)
)
for page in result.pages:
for line_idx, line in enumerate(page.lines):
print(
"...Line # {} has text content '{}'".format(
line_idx,
line.content.encode("utf-8")
)
)
for selection_mark in page.selection_marks:
print(
"...Selection mark is '{}' and has a confidence of {}".format(
selection_mark.state,
selection_mark.confidence
)
)
for table_idx, table in enumerate(result.tables):
print(
"Table # {} has {} rows and {} columns".format(
table_idx, table.row_count, table.column_count
)
)
for cell in table.cells:
print(
"...Cell[{}][{}] has content '{}'".format(
cell.row_index,
cell.column_index,
cell.content.encode("utf-8"),
)
)
print("----------------------------------------")
サンプルコードは、以下の画像を解析するために使用されます。
出力結果
Document contains handwritten content
...Line # 0 has text content 'b'UNITED STATES''
...Line # 1 has text content 'b'SECURITIES AND EXCHANGE COMMISSION''
...Line # 2 has text content 'b'Washington, D.C. 20549''
...Line # 3 has text content 'b'FORM 10-Q''
...Line # 4 has text content 'b'\xe2\x98\x92''
...Line # 5 has text content 'b'QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF''
...Line # 6 has text content 'b'1934''
...Line # 7 has text content 'b'For the Quarterly Period Ended March 31, 2020''
...Line # 8 has text content 'b'OR''
...Line # 9 has text content 'b'\xe2\x98\x90''
...Line # 10 has text content 'b'TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF''
...Line # 11 has text content 'b'1934''
...Line # 12 has text content 'b'For the Transition Period From''
...Line # 13 has text content 'b'to''
...Line # 14 has text content 'b'Commission File Number 001-37845''
...Line # 15 has text content 'b'MICROSOFT CORPORATION''
...Line # 16 has text content 'b'WASHINGTON''
...Line # 17 has text content 'b'(STATE OF INCORPORATION)''
...Line # 18 has text content 'b'91-1144442''
...Line # 19 has text content 'b'(I.R.S. ID)''
...Line # 20 has text content 'b'ONE MICROSOFT WAY, REDMOND, WASHINGTON 98052-6399''
...Line # 21 has text content 'b'(425) 882-8080''
...Line # 22 has text content 'b'www.microsoft.com/investor''
...Line # 23 has text content 'b'Securities registered pursuant to Section 12(b) of the Act:''
...Line # 24 has text content 'b'Title of each class''
...Line # 25 has text content 'b'Trading Symbol''
...Line # 26 has text content 'b'Name of exchange on which registered''
...Line # 27 has text content 'b'Common stock, $0.00000625 par value per share''
...Line # 28 has text content 'b'MSFT''
...Line # 29 has text content 'b'NASDAQ''
...Line # 30 has text content 'b'2.125% Notes due 2021''
...Line # 31 has text content 'b'MSFT''
...Line # 32 has text content 'b'NASDAQ''
...Line # 33 has text content 'b'3.125% Notes due 2028''
...Line # 34 has text content 'b'MSFT''
...Line # 35 has text content 'b'NASDAQ''
...Line # 36 has text content 'b'2.625% Notes due 2033''
...Line # 37 has text content 'b'MSFT''
...Line # 38 has text content 'b'NASDAQ''
...Line # 39 has text content 'b'Securities registered pursuant to Section 12(g) of the Act:''
...Line # 40 has text content 'b'NONE''
...Line # 41 has text content 'b'Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange''
...Line # 42 has text content 'b'Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has''
...Line # 43 has text content 'b'been subject to such filing requirements for the past 90 days. Yes''
...Line # 44 has text content 'b'No''
...Line # 45 has text content 'b'Indicate by check mark whether the registrant has submitted electronically every Interactive Data File required to be submitted pursuant to Rule''
...Line # 46 has text content 'b'405 of Regulation S-T (\xc2\xa7232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to''
...Line # 47 has text content 'b'submit such files). Yes''
...Line # 48 has text content 'b'No''
...Line # 49 has text content 'b'Indicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, a smaller reporting''
...Line # 50 has text content 'b'company, or an emerging growth company. See the definitions of "large accelerated filer," "accelerated filer," "smaller reporting company,"''
...Line # 51 has text content 'b'and "emerging growth company" in Rule 12b-2 of the Exchange Act.''
...Line # 52 has text content 'b'Large accelerated filer''
...Line # 53 has text content 'b'Accelerated filer''
...Line # 54 has text content 'b'Non-accelerated filer''
...Line # 55 has text content 'b'Smaller reporting company''
...Line # 56 has text content 'b'Emerging growth company''
...Line # 57 has text content 'b'If an emerging growth company, indicate by check mark if the registrant has elected not to use the extended transition period for complying''
...Line # 58 has text content 'b'with any new or revised financial accounting standards provided pursuant to Section 13(a) of the Exchange Act.''
...Line # 59 has text content 'b'Indicate by check mark whether the registrant is a shell company (as defined in Rule 12b-2 of the Exchange Act).''
...Line # 60 has text content 'b'Yes''
...Line # 61 has text content 'b'No''
...Line # 62 has text content 'b"Indicate the number of shares outstanding of each of the issuer's classes of common stock, as of the latest practicable date."'
...Line # 63 has text content 'b'Class''
...Line # 64 has text content 'b'Outstanding as of April 24, 2020''
...Line # 65 has text content 'b'Common Stock, $0.00000625 par value per share''
...Line # 66 has text content 'b'7,583,440,247 shares''
...Selection mark is 'selected' and has a confidence of 0.98
...Selection mark is 'unselected' and has a confidence of 0.98
...Selection mark is 'selected' and has a confidence of 0.99
...Selection mark is 'unselected' and has a confidence of 0.995
...Selection mark is 'selected' and has a confidence of 0.99
...Selection mark is 'unselected' and has a confidence of 0.987
...Selection mark is 'selected' and has a confidence of 0.983
...Selection mark is 'unselected' and has a confidence of 0.982
...Selection mark is 'unselected' and has a confidence of 0.99
...Selection mark is 'unselected' and has a confidence of 0.995
...Selection mark is 'unselected' and has a confidence of 0.987
...Selection mark is 'unselected' and has a confidence of 0.99
...Selection mark is 'unselected' and has a confidence of 0.997
...Selection mark is 'selected' and has a confidence of 0.99
Table # 0 has 5 rows and 3 columns
...Cell[0][0] has content 'b'Title of each class''
...Cell[0][1] has content 'b'Trading Symbol''
...Cell[0][2] has content 'b'Name of exchange on which registered''
...Cell[1][0] has content 'b'Common stock, $0.00000625 par value per share''
...Cell[1][1] has content 'b'MSFT''
...Cell[1][2] has content 'b'NASDAQ''
...Cell[2][0] has content 'b'2.125% Notes due 2021''
...Cell[2][1] has content 'b'MSFT''
...Cell[2][2] has content 'b'NASDAQ''
...Cell[3][0] has content 'b'3.125% Notes due 2028''
...Cell[3][1] has content 'b'MSFT''
...Cell[3][2] has content 'b'NASDAQ''
...Cell[4][0] has content 'b'2.625% Notes due 2033''
...Cell[4][1] has content 'b'MSFT''
...Cell[4][2] has content 'b'NASDAQ''
Table # 1 has 2 rows and 2 columns
...Cell[0][0] has content 'b'Class''
...Cell[0][1] has content 'b'Outstanding as of April 24, 2020''
...Cell[1][0] has content 'b'Common Stock, $0.00000625 par value per share''
...Cell[1][1] has content 'b'7,583,440,247 shares''
解説
-
テキストの行
文書から抽出されたテキストの行が一覧として表示されます。 -
選択マーク
チェックボックスの状態(例:「選択済み」「未選択」)が表示されます。 -
表の内容
文書内には2つの表が含まれており、それぞれ以下の情報を示しています:- 表1:証券の種類、取引記号、登録されている取引所のリスト。
- 表2:株式のクラスと2020年4月24日時点での発行済み株式数。
LocalのPDFを実行する
local.py
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
key = "YOUR_FORM_RECOGNIZER_KEY"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(api_key)
)
# ローカルのPDFファイルのパス
document_path = <"PDFのパス">
with open(document_path, "rb") as document_file:
poller = document_analysis_client.begin_analyze_document(
"prebuilt-layout", document=document_file
)
result = poller.result()
for idx, style in enumerate(result.styles):
print(
"Document contains {} content".format(
"handwritten" if style.is_handwritten else "no handwritten"
)
)
for page in result.pages:
for line_idx, line in enumerate(page.lines):
print(
"...Line # {} has text content '{}'".format(
line_idx, line.content.encode("utf-8")
)
)
for selection_mark in page.selection_marks:
print(
"...Selection mark is '{}' and has a confidence of {}".format(
selection_mark.state, selection_mark.confidence
)
)
for table_idx, table in enumerate(result.tables):
print(
"Table # {} has {} rows and {} columns".format(
table_idx, table.row_count, table.column_count
)
)
for cell in table.cells:
print(
"...Cell[{}][{}] has content '{}'".format(
cell.row_index,
cell.column_index,
cell.content.encode("utf-8"),
)
)
print("----------------------------------------")
まとめ
出力結果を見たところ、Checkboxの検出は可能ですが、どの項目(Key)のCheckboxであるかを特定することが難しい点がネックです。ただし、Tableの検出ができる点は非常にいいなと。カスタムモデルを使うしかないかー。
Discussion