🐡

dbt macro tips advent calendar 2022 day 10 - on-run-end Context

2022/12/10に公開

dbt

tech

便利なデータ変換ツールである dbt の中のmacroに関するtipsを書いていく dbt macro tips Advent Calendar 2022 10日目です。

on-run-end Context

dbtにはいくつかのhookがあります。

on-run-start: dbtの各動作モード開始時に呼ばれるhook
pre-hook: 各モデルの開始前の呼ばれるhook
post-hook: 各モデルの実行後に呼ばれるhook
on-run-end: dbtの各動作モードの終了時に呼ばれるhook

それぞれのhookについては別の機会にしれっと書くとして、今回はon-run-end hookについて詳しく解説します。
on-run-endには、このhookのときのみに使用可能なコンテキスト変数がいくつか存在します。

schemas と database_schemas

dbtのいい感じの挙動の一つに、schema(BigQueryの場合はdataset)が存在しなかったら、作成するというものがあります。
この挙動は便利なものなのですが、schemaに対して権限を付与したいケースというのはよくあることでしょう。
そういうケースではon-run-end hookの database_schemas コンテキスト変数を使うと良いでしょう。
この変数には実行されるモデルにて使用されるschemaの一覧が入っています。

ドキュメントにあるサンプルのmacroを借りると以下のように書くことが可能です。

macros/grants.sql

{% macro grant_usage_to_schemas(schemas, user) %}
  {% for schema in schemas %}
    grant usage on schema {{ schema }} to {{ user }};
  {% endfor %}
{% endmacro %}

dbt_project.yml

on-run-end:
 - "{{ grant_usage_to_schemas(schemas, 'postgres') }}"

database_schemasはschemasの databaseがついている版というだけです。

さて、このmacroの動作について少し補足したいと思います。
このon-run-end hookを記述した状態でdbtを実行すると、以下の文字列が描画されます。


    grant usage on schema dev to postgres;

on-run-end hookでは空の文字列以外が描画された場合は、その描画された文字列をクエリと解釈して実行します。

先日の記事では、run_queryというものの例がありました。
上記のmacroは次のようにも書き換えられると思います。

macros/grants.sql

{%- macro grant_usage_to_schemas(schemas, user) %}
  {%- if execute %}
    {%- set sql %}
      {% for schema in schemas %}
      grant usage on schema {{ schema }} to {{ user }};
      {% endfor %}
    {%- endset %}
    {%- do run_query(sql) %}
  {%- endif %}
{%- endmacro %}

一体何が違うのでしょうか？　実はこの２つほぼ等価ですが、実行ログを見ると少し違います。

run_query不使用

07:48:49  Running 1 on-run-end hook
07:48:49  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
07:48:49  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [GRANT in 0.01s]
07:48:49  
07:48:49  
07:48:49  Finished running 1 table model, 2 tests, 1 hook in 0 hours 0 minutes and 0.54 seconds (0.54s).
07:48:49

run_query使用

07:57:42  Running 1 on-run-end hook
07:57:42  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
07:57:42  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [OK in 0.00s]
07:57:42  
07:57:42  
07:57:42  Finished running 1 table model, 2 tests, 1 hook in 0 hours 0 minutes and 0.59 seconds (0.59s).
07:57:42

ログ上では、SQLの実行ステータスがGRANTとOKとなり違っています。
実は実質的な違いはコレくらいしかありません。
例えば、より実用的なmacroとして、存在しているdb_user全員にUsage権限を与える！ というようなケースでは run_queryを使ってdb_user一覧を取得することになるので、後者の書き方のほうが書きやすいかもしれませんね。このへんはお好みとなります。

Results

もう一つ利用可能なコンテキスト変数に results というものがあります。
この変数には各モデルの実行結果が入っています。

こちらもサンプルを持ってくると

{% macro log_results(results) %}

  {% if execute %}
  {{ log("========== Begin Summary ==========", info=True) }}
  {% for res in results -%}
    {% set line -%}
        node: {{ res.node.unique_id }}; status: {{ res.status }} (message: {{ res.message }})
    {%- endset %}

    {{ log(line, info=True) }}
  {% endfor %}
  {{ log("========== End Summary ==========", info=True) }}
  {% endif %}

{% endmacro %}

dbt_project.yml

on-run-end:
 - "{{ log_results(results) }}"

$ dbt build --select my_first_dbt_model
08:14:00  Running with dbt=1.3.1
08:14:00  Unable to do partial parsing because a project config has changed
08:14:01  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 293 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
08:14:01  
08:14:01  Concurrency: 4 threads (target='dev')
08:14:01  
08:14:01  1 of 3 START sql table model dev.my_first_dbt_model ............................ [RUN]
08:14:01  1 of 3 OK created sql table model dev.my_first_dbt_model ....................... [SELECT 6 in 0.17s]
08:14:01  2 of 3 START test not_null_my_first_dbt_model_id ............................... [RUN]
08:14:01  3 of 3 START test unique_my_first_dbt_model_id ................................. [RUN]
08:14:01  3 of 3 PASS unique_my_first_dbt_model_id ....................................... [PASS in 0.08s]
08:14:01  2 of 3 PASS not_null_my_first_dbt_model_id ..................................... [PASS in 0.08s]
08:14:01  
08:14:01  Running 1 on-run-end hook
08:14:01  ========== Begin Summary ==========
08:14:01  node: model.macro_tips_advcal.my_first_dbt_model; status: success (message: SELECT 6)
08:14:01  node: test.macro_tips_advcal.unique_my_first_dbt_model_id.16e066b321; status: pass (message: None)
08:14:01  node: test.macro_tips_advcal.not_null_my_first_dbt_model_id.5fb22c2710; status: pass (message: None)
08:14:01  ========== End Summary ==========
08:14:01  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
08:14:01  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [OK in 0.00s]
08:14:01  
08:14:01  
08:14:01  Finished running 1 table model, 2 tests, 1 hook in 0 hours 0 minutes and 0.53 seconds (0.53s).
08:14:01  
08:14:01  Completed successfully
08:14:01  
08:14:01  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

このようになります。
results に入っている情報は一体どのようなデータなのでしょう。
それは run_results.json に記述されているものと同様になります。

target/run_results.json

{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json",
    "dbt_version": "1.3.1",
    "generated_at": "2022-12-05T08:14:01.958969Z",
    "invocation_id": "3ac902a3-2796-4d62-a4dd-fe8f0213c66e",
    "env": {}
  },
  "results": [
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2022-12-05T08:14:01.650592Z",
          "completed_at": "2022-12-05T08:14:01.657809Z"
        },
        {
          "name": "execute",
          "started_at": "2022-12-05T08:14:01.658680Z",
          "completed_at": "2022-12-05T08:14:01.818619Z"
        }
      ],
      "thread_id": "Thread-1 (worker)",
      "execution_time": 0.1734468936920166,
      "adapter_response": {
        "_message": "SELECT 6",
        "code": "SELECT",
        "rows_affected": 6
      },
      "message": "SELECT 6",
      "failures": null,
      "unique_id": "model.macro_tips_advcal.my_first_dbt_model"
    },
    {
      "status": "pass",
      "timing": [
        {
          "name": "compile",
          "started_at": "2022-12-05T08:14:01.832279Z",
          "completed_at": "2022-12-05T08:14:01.858556Z"
        },
        {
          "name": "execute",
          "started_at": "2022-12-05T08:14:01.878198Z",
          "completed_at": "2022-12-05T08:14:01.907014Z"
        }
      ],
      "thread_id": "Thread-4 (worker)",
      "execution_time": 0.08139705657958984,
      "adapter_response": {},
      "message": null,
      "failures": 0,
      "unique_id": "test.macro_tips_advcal.unique_my_first_dbt_model_id.16e066b321"
    },
    {
      "status": "pass",
      "timing": [
        {
          "name": "compile",
          "started_at": "2022-12-05T08:14:01.832046Z",
          "completed_at": "2022-12-05T08:14:01.858363Z"
        },
        {
          "name": "execute",
          "started_at": "2022-12-05T08:14:01.859244Z",
          "completed_at": "2022-12-05T08:14:01.909388Z"
        }
      ],
      "thread_id": "Thread-3 (worker)",
      "execution_time": 0.08289408683776855,
      "adapter_response": {},
      "message": null,
      "failures": 0,
      "unique_id": "test.macro_tips_advcal.not_null_my_first_dbt_model_id.5fb22c2710"
    }
  ],
  "elapsed_time": 0.5323131084442139,
  "args": {
    "write_json": true,
    "use_colors": true,
    "printer_width": 80,
    "version_check": true,
    "partial_parse": true,
    "static_parser": true,
    "profiles_dir": "*********:秘密だよ！！！！***************",
    "send_anonymous_usage_stats": true,
    "event_buffer_size": 100000,
    "quiet": false,
    "no_print": false,
    "indirect_selection": "eager",
    "resource_types": [],
    "select": [
      "my_first_dbt_model"
    ],
    "which": "build",
    "rpc_method": "build"
  }
}

run_queryと組み合わせれば、実行結果に関する情報をDWHへon-run-end hook段階で保存することも可能になります。実行監視目的等でよく使うことでしょう

11日目は、箸休め的に失敗したテストの結果を表示するdump_fail_test macroの紹介をします。

on-run-end Context

schemas と database_schemas

Results

Discussion