😸

dbt macro tips advent calendar 2022 day 13 - pre-hookとpost-hook

2022/12/13に公開約7,600字

便利なデータ変換ツールである dbt の中のmacroに関するtipsを書いていく dbt macro tips Advent Calendar 2022 13日目です。

9日目にしれっと書いていたのですが、本日はpre-hookとpost-hookという各モデルの実行前とあとに処理を挟める機能について話します。

pre-hookとpost-hook

https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook

まず基本的な使い方です。

models/numbers.sql
{{
    config(
        materialized='table',
        pre_hook=[
	        'CREATE TEMP TABLE __temp_one AS SELECT 1 as number',
	    ],
        post_hook=[
	        'INSERT INTO {{ this }} SELECT number + 2 FROM {{ this }}',
	    ],
    )
}}

SELECT number FROM __temp_one
UNION ALL
SELECT number+1 FROM __temp_one

さて、コレを実行した結果はどうなるでしょう?

$ dbt build --select numbers
10:22:41  Running with dbt=1.3.1
10:22:41  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 295 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
10:22:41  
10:22:41  Concurrency: 4 threads (target='dev')
10:22:42  
10:22:42  1 of 1 START sql table model dev.numbers ....................................... [RUN]
10:22:42  1 of 1 OK created sql table model dev.numbers .................................. [SELECT 2 in 0.24s]
10:22:42  
10:22:42  Running 1 on-run-end hook
10:22:42  ========== Begin Failure Tests  ==========
10:22:42  ========== End Failure Tests ==========
10:22:42  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
10:22:42  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [OK in 0.00s]
10:22:42  
10:22:42  
10:22:42  Finished running 1 table model, 1 hook in 0 hours 0 minutes and 0.55 seconds (0.55s).
10:22:42  
10:22:42  Completed successfully
10:22:42  
10:22:42  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
$ psql -h 127.0.0.1 -U postgres -p 5432 -d postgres
Password for user postgres: 
psql (14.2, server 14.5 (Debian 14.5-2.pgdg110+2))
Type "help" for help.

postgres=# select * from dev.numbers;
 number 
--------
      1
      2
      3
      4
(4 rows)

postgres=# 

pre-hookとpost-hookの基本的な使い方はこのようにSQLを1行ずつ書く形になります。
このSQLはJinjaテンプレート構文が使えますので、当然自作のmacroも書けます。

また、このpre-hookとpost-hookには before_beginおよび after_commit というトランザクション制御のためのヘルパーmacroがあります。
この辺の実行タイミングを確認するためにlogを出すだけのマクロを書いてみましょう。

macros/noop.sql
{%- macro noop(timing) %}
    {%- if execute %}
    {%- do log('noop for '~this~' in execute phase '~timing~' hook', info=True) %}
    {%- else %}
    {%- do log('noop for '~this~' in parse phase '~timing~' hook', info=True) %}
    {%- endif %}
{%- endmacro %}
models/numbers.sql
{{
    config(
        materialized='table',
        pre_hook=[
            'CREATE TEMP TABLE __temp_one AS SELECT 1 as number',
            "{{ noop('pre') }}",
            before_begin("{{ noop('before_begin pre') }}"),

        ],
        post_hook=[
            'INSERT INTO {{ this }} SELECT number + 2 FROM {{ this }}',
            after_commit("{{ noop('after_commit post') }}"),
            "{{ noop('post') }}",
        ],
    )
}}

{%- do log('model sql render', info=True) %}
SELECT number FROM __temp_one
UNION ALL
SELECT number+1 FROM __temp_one
$ dbt build --select numbers
10:36:18  Running with dbt=1.3.1
10:36:19  model sql render
10:36:19  noop for "postgres"."dev"."numbers" in parse phase pre hook
10:36:19  noop for "postgres"."dev"."numbers" in parse phase before_begin pre hook
10:36:19  noop for "postgres"."dev"."numbers" in parse phase after_commit post hook
10:36:19  noop for "postgres"."dev"."numbers" in parse phase post hook
10:36:19  Found 3 models, 4 tests, 0 snapshots, 0 analyses, 296 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
10:36:19  
10:36:19  Concurrency: 4 threads (target='dev')
10:36:19  
10:36:20  1 of 1 START sql table model dev.numbers ....................................... [RUN]
10:36:20  model sql render
10:36:20  noop for "postgres"."dev"."numbers" in execute phase before_begin pre hook
10:36:20  noop for "postgres"."dev"."numbers" in execute phase pre hook
10:36:20  noop for "postgres"."dev"."numbers" in execute phase post hook
10:36:20  noop for "postgres"."dev"."numbers" in execute phase after_commit post hook
10:36:20  1 of 1 OK created sql table model dev.numbers .................................. [SELECT 2 in 0.41s]
10:36:20  
10:36:20  Running 1 on-run-end hook
10:36:20  ========== Begin Failure Tests  ==========
10:36:20  ========== End Failure Tests ==========
10:36:20  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
10:36:20  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [OK in 0.00s]
10:36:20  
10:36:20  
10:36:20  Finished running 1 table model, 1 hook in 0 hours 0 minutes and 0.65 seconds (0.65s).
10:36:20  
10:36:20  Completed successfully
10:36:20  
10:36:20  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

この実行を見ると面白いことがいくつかわかります。

  • pre-hookとpost-hookはparse phaseにも実行されている。
  • parse phaseで実行する際は書かれている順に実行される
  • execute phaseで実行される際は before_beginが先に実行される
  • execute phaseで実行される際は after_beginが先に実行される

ここで、parse phaseexecute phaseで異なるSQLを描画したら、どっちが実行されるの?
気になりますよね。試してみましょう。

macros/insert_row.sql
{%- macro insert_row() %}
    INSERT INTO {{ this }} SELECT
    {%- if execute %}
        'execute'
    {%- else %}
        'parse'
    {%- endif %}
{%- endmacro %}
models/text.sql
{{
    config(
        materialized='table',
        post_hook=[
            "{{ insert_row() }}",
        ],
    )
}}

SELECT ''
$ dbt build --select text   
10:43:50  Running with dbt=1.3.1
10:43:51  Found 4 models, 4 tests, 0 snapshots, 0 analyses, 297 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
10:43:51  
10:43:51  Concurrency: 4 threads (target='dev')
10:43:51  
10:43:51  1 of 1 START sql table model dev.text .......................................... [RUN]
10:43:52  1 of 1 OK created sql table model dev.text ..................................... [SELECT 1 in 0.18s]
10:43:52  
10:43:52  Running 1 on-run-end hook
10:43:52  ========== Begin Failure Tests  ==========
10:43:52  ========== End Failure Tests ==========
10:43:52  1 of 1 START hook: macro_tips_advcal.on-run-end.0 .............................. [RUN]
10:43:52  1 of 1 OK hook: macro_tips_advcal.on-run-end.0 ................................. [OK in 0.00s]
10:43:52  
10:43:52  
10:43:52  Finished running 1 table model, 1 hook in 0 hours 0 minutes and 0.46 seconds (0.46s).
10:43:52  
10:43:52  Completed successfully
10:43:52  
10:43:52  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
$ psql -h 127.0.0.1 -U postgres -p 5432 -d postgres
Password for user postgres: 
psql (14.2, server 14.5 (Debian 14.5-2.pgdg110+2))
Type "help" for help.

postgres=# select * from dev.text;
 ?column? 
----------
 
 execute
(2 rows)

postgres=# 

execute phaseのものが実行採用されてますね。(まぁ当然ですが。)

別の書き方

同じhookを全部のmodelに設定したいとき、一つ一つ書く必要があるのか?
ご心配ご無用。 次のように書くことができます

dbt_project.yml
models:
  +pre-hook:
    sql: "<sql-statement>"
    transaction: false
  +post-hook:
    sql: "<sql-statement>"
    transaction: false
  project_name:
    +pre-hook:
      sql: "<sql-statement>"
      transaction: false
    +post-hook:
      sql: "<sql-statement>"
      transaction: false
    sub_dir:
      +pre-hook:
        sql: "<sql-statement>"
        transaction: false
      +post-hook:
         sql: "<sql-statement>"
         transaction: false

プロジェクトごとごとに纏めて設定することもできますし、階層分けした一部だけにも適用することが可能となっています。


14日目は道具が揃ってきたので、grantに関する便利なマクロ例を提示します。

Discussion

ログインするとコメントできます