💭
Snowflake x dbtやってみた〜ドキュメンテーション編〜
一番簡単にドキュメンテーション化してみる
schema.ymlのテーブルやカラムに対してdescriptionを記載し、ドキュメントに表示ができます。
models/schema.yml
version: 2
models:
- name: dim_listings_cleansed
description: Cleansed table which contains Airbnb listings.
columns:
- name: listing_id
description: Primary key for the listing
tests:
- unique
- not_null
dbt docs generateコマンドでドキュメントを生成します。
ubuntu@dbt:~/dbtlearn$ dbt docs generate
03:56:48 Running with dbt=1.3.1
03:56:49 Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
03:56:49
03:56:50 Concurrency: 4 threads (target='dev')
03:56:50
03:56:52 Done.
03:56:52 Building catalog
03:56:56 Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$
ドキュメントサーバを起動させます。
dbt docs serveコマンドでサーバを起動させます。
ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:25:03 Running with dbt=1.3.1
04:25:03 Serving docs at 0.0.0.0:8080
04:25:03 To access from your browser, navigate to: http://localhost:8080
04:25:03
04:25:03
04:25:03 Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:25:04] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:25:12] "GET /manifest.json?cb=1670732712042 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:25:14] "GET /catalog.json?cb=1670732712042 HTTP/1.1" 200 -
ブラウザが立ち上がり、ドキュメントが表示されます。
高度なドキュメント操作
1.テーブルまたは列に対してのマークダウンドキュメント作成
先ほど紹介した簡単な手法だと、長文のdescriptionを書くとschema.ymlの可読性が落ちます。
そのため、descriptionを外出しします。
minimum_nightsカラムのdescriptionを外出しします。
models/schema.yml
- name: minimum_nights
description: '{{ doc("dim_listing_cleansed__minimum_nights") }}'
tests:
- positive_value
dbtプロジェクト配下に以下の内容でdocs.ymlを置きます。
docs.yml
{% docs dim_listing_cleansed__minimum_nights %}
Minimum number of nights required to rent this property.
Keep in mind that old listings might have `minimum_nights` set
to 0 in the source tables. Our cleansing algorithm updates this to `1`.
{% enddocs %}
ではドキュメントを生成していきます。
ubuntu@dbt:~/dbtlearn$ dbt docs generate
04:39:04 Running with dbt=1.3.1
04:39:04 Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
04:39:04
04:39:06 Concurrency: 4 threads (target='dev')
04:39:06
04:39:08 Done.
04:39:08 Building catalog
04:39:11 Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$
ドキュメントサーバを立ち上げます。
ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:39:41 Running with dbt=1.3.1
04:39:41 Serving docs at 0.0.0.0:8080
04:39:41 To access from your browser, navigate to: http://localhost:8080
04:39:41
04:39:41
04:39:41 Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:39:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:39:43] "GET /manifest.json?cb=1670733583024 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:39:43] "GET /catalog.json?cb=1670733583024 HTTP/1.1" 200 -
以下のように無事ドキュメントとして表示されています。
2.トップページの編集
models配下にoverview.mdというマークダウンファイルを作成し、以下内容を入れます。
models/overview.md
{% docs __overview__ %}
# Airbnb pipeline
Hey, welcome to our Airbnb pipeline documentation!
Here is the schema of our input data:
![input schema](https://dbtlearn.s3.us-east-2.amazonaws.com/input_schema.png)
{% enddocs %}
ドキュメントの生成とサーバ立ち上げを行います。
ubuntu@dbt:~/dbtlearn$ dbt docs generate
04:46:52 Running with dbt=1.3.1
04:46:52 Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
04:46:52
04:46:53 Concurrency: 4 threads (target='dev')
04:46:53
04:46:55 Done.
04:46:55 Building catalog
04:46:57 Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:48:41 Running with dbt=1.3.1
04:48:41 Serving docs at 0.0.0.0:8080
04:48:41 To access from your browser, navigate to: http://localhost:8080
04:48:41
04:48:41
04:48:41 Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:48:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:48:43] "GET /manifest.json?cb=1670734123757 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:48:44] "GET /catalog.json?cb=1670734123757 HTTP/1.1" 200 -
画像の参照先がS3からdbt自体に持たせることも可能です。
dbtプロジェクト配下にassetsというディレクトリを作成します。
mkdir assets
dbt_project.ymlの修正
最終行を追加します。
dbt_project.yml
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
asset-paths: ["assets"]
画像をassetディレクトリに格納します。
ubuntu@dbt:~/dbtlearn$ wget https://dbtlearn.s3.us-east-2.amazonaws.com/input_schema.png
ubuntu@dbt:~/dbtlearn$ mv input_schema.png asset/
overview.mdの画像の参照先を変更する
models/overview.md
{% docs __overview__ %}
# Airbnb pipeline
Hey, welcome to our Airbnb pipeline documentation!
Here is the schema of our input data:
![input schema](assets/input_schema.png)
{% enddocs %}
ドキュメントの生成とサーバ立ち上げを行います。
ubuntu@dbt:~/dbtlearn$ dbt docs generate
05:04:40 Running with dbt=1.3.1
05:04:40 Unable to do partial parsing because a project config has changed
05:04:44 Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
05:04:44
05:04:45 Concurrency: 4 threads (target='dev')
05:04:45
05:04:47 Done.
05:04:47 Building catalog
05:04:50 Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$ dbt docs serve
05:05:00 Running with dbt=1.3.1
05:05:00 Serving docs at 0.0.0.0:8080
05:05:00 To access from your browser, navigate to: http://localhost:8080
05:05:00
05:05:00
05:05:00 Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 05:05:00] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:02] "GET /manifest.json?cb=1670735102088 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:02] "GET /catalog.json?cb=1670735102088 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:03] "GET /assets/input_schema.png HTTP/1.1" 200 -
Discussion