💭

Snowflake x dbtやってみた〜ドキュメンテーション編〜

2022/12/14に公開約6,300字

一番簡単にドキュメンテーション化してみる

schema.ymlのテーブルやカラムに対してdescriptionを記載し、ドキュメントに表示ができます。

models/schema.yml
version: 2

models:
  - name: dim_listings_cleansed
    description: Cleansed table which contains Airbnb listings.
    columns:

     - name: listing_id
       description: Primary key for the listing
       tests:
         - unique
         - not_null

dbt docs generateコマンドでドキュメントを生成します。

ubuntu@dbt:~/dbtlearn$ dbt docs generate
03:56:48  Running with dbt=1.3.1
03:56:49  Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
03:56:49  
03:56:50  Concurrency: 4 threads (target='dev')
03:56:50  
03:56:52  Done.
03:56:52  Building catalog
03:56:56  Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$

ドキュメントサーバを起動させます。
dbt docs serveコマンドでサーバを起動させます。

ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:25:03  Running with dbt=1.3.1
04:25:03  Serving docs at 0.0.0.0:8080
04:25:03  To access from your browser, navigate to:  http://localhost:8080
04:25:03  
04:25:03  
04:25:03  Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:25:04] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:25:12] "GET /manifest.json?cb=1670732712042 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:25:14] "GET /catalog.json?cb=1670732712042 HTTP/1.1" 200 -

ブラウザが立ち上がり、ドキュメントが表示されます。

高度なドキュメント操作

1.テーブルまたは列に対してのマークダウンドキュメント作成

先ほど紹介した簡単な手法だと、長文のdescriptionを書くとschema.ymlの可読性が落ちます。
そのため、descriptionを外出しします。
minimum_nightsカラムのdescriptionを外出しします。

models/schema.yml
     - name: minimum_nights
       description: '{{ doc("dim_listing_cleansed__minimum_nights") }}'
       tests:
         - positive_value

dbtプロジェクト配下に以下の内容でdocs.ymlを置きます。

docs.yml
{% docs dim_listing_cleansed__minimum_nights %}
Minimum number of nights required to rent this property. 

Keep in mind that old listings might have `minimum_nights` set 
to 0 in the source tables. Our cleansing algorithm updates this to `1`.

{% enddocs %}

ではドキュメントを生成していきます。

ubuntu@dbt:~/dbtlearn$ dbt docs generate
04:39:04  Running with dbt=1.3.1
04:39:04  Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
04:39:04  
04:39:06  Concurrency: 4 threads (target='dev')
04:39:06  
04:39:08  Done.
04:39:08  Building catalog
04:39:11  Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$ 

ドキュメントサーバを立ち上げます。

ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:39:41  Running with dbt=1.3.1
04:39:41  Serving docs at 0.0.0.0:8080
04:39:41  To access from your browser, navigate to:  http://localhost:8080
04:39:41  
04:39:41  
04:39:41  Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:39:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:39:43] "GET /manifest.json?cb=1670733583024 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:39:43] "GET /catalog.json?cb=1670733583024 HTTP/1.1" 200 -

以下のように無事ドキュメントとして表示されています。

2.トップページの編集

models配下にoverview.mdというマークダウンファイルを作成し、以下内容を入れます。

models/overview.md
{% docs __overview__ %}
# Airbnb pipeline

Hey, welcome to our Airbnb pipeline documentation!

Here is the schema of our input data:
![input schema](https://dbtlearn.s3.us-east-2.amazonaws.com/input_schema.png)

{% enddocs %}

ドキュメントの生成とサーバ立ち上げを行います。

ubuntu@dbt:~/dbtlearn$ dbt docs generate
04:46:52  Running with dbt=1.3.1
04:46:52  Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
04:46:52  
04:46:53  Concurrency: 4 threads (target='dev')
04:46:53  
04:46:55  Done.
04:46:55  Building catalog
04:46:57  Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$ dbt docs serve
04:48:41  Running with dbt=1.3.1
04:48:41  Serving docs at 0.0.0.0:8080
04:48:41  To access from your browser, navigate to:  http://localhost:8080
04:48:41  
04:48:41  
04:48:41  Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 04:48:42] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:48:43] "GET /manifest.json?cb=1670734123757 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 04:48:44] "GET /catalog.json?cb=1670734123757 HTTP/1.1" 200 -

画像の参照先がS3からdbt自体に持たせることも可能です。

dbtプロジェクト配下にassetsというディレクトリを作成します。
mkdir assets
dbt_project.ymlの修正
最終行を追加します。

dbt_project.yml
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
assets-path: ["assets"]

画像をassetsディレクトリに格納します。

ubuntu@dbt:~/dbtlearn$ wget https://dbtlearn.s3.us-east-2.amazonaws.com/input_schema.png
ubuntu@dbt:~/dbtlearn$ mv input_schema.png assets/

overview.mdの画像の参照先を変更する

models/overview.md
{% docs __overview__ %}
# Airbnb pipeline

Hey, welcome to our Airbnb pipeline documentation!

Here is the schema of our input data:
![input schema](assets/input_schema.png)

{% enddocs %}

ドキュメントの生成とサーバ立ち上げを行います。

ubuntu@dbt:~/dbtlearn$ dbt docs generate
05:04:40  Running with dbt=1.3.1
05:04:40  Unable to do partial parsing because a project config has changed
05:04:44  Found 8 models, 8 tests, 1 snapshot, 0 analyses, 492 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
05:04:44  
05:04:45  Concurrency: 4 threads (target='dev')
05:04:45  
05:04:47  Done.
05:04:47  Building catalog
05:04:50  Catalog written to /home/ubuntu/dbtlearn/target/catalog.json
ubuntu@dbt:~/dbtlearn$ dbt docs serve
05:05:00  Running with dbt=1.3.1
05:05:00  Serving docs at 0.0.0.0:8080
05:05:00  To access from your browser, navigate to:  http://localhost:8080
05:05:00  
05:05:00  
05:05:00  Press Ctrl+C to exit.
127.0.0.1 - - [11/Dec/2022 05:05:00] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:02] "GET /manifest.json?cb=1670735102088 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:02] "GET /catalog.json?cb=1670735102088 HTTP/1.1" 200 -
127.0.0.1 - - [11/Dec/2022 05:05:03] "GET /assets/input_schema.png HTTP/1.1" 200 -

Discussion

ログインするとコメントできます