😊

dbt macro tips advent calendar 2022 day 20 - カスタムテスト

2022/12/20に公開

便利なデータ変換ツールである dbt の中のmacroに関するtipsを書いていく dbt macro tips Advent Calendar 2022 20日目です。

カスタムテスト

dbtの強力な機能の一つにtestsがあります。

version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null

  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        description: "The primary key for this table"
        tests:
          - unique
          - not_null

このように書けば、uniqueやnot_nullのテストをしてくれます。
他にもrelationshipsやaccepted_valuesというテストも用意されており、基本的にはこの辺を使えば良いと思います。

また、足りなければ

https://github.com/dbt-labs/dbt-utils

https://github.com/calogica/dbt-expectations

などもあります。

ですが、やはり事業独特のドメイン知識に基づいたテストというのは欲しくなりますよね。
そこで、今回は自分たちのドメイン知識を反映したカスタムなテストの作り方について解説します。

ドキュメントはこちら。

https://docs.getdbt.com/guides/best-practices/writing-custom-generic-tests

やはり、こういう場合は同しようもないしょうもないテストを書くところからでしょう。

tests/generic/test_column_name_is_not_name.sql
{% test column_name_is_not_name(model, column_name) %}

with validation as (
    select '{{ column_name }}' as column_name
),
validation_errors as (
    select
        column_name
    from validation
    where column_name = 'name'
)

select *
from validation_errors

{% endtest %}

カラム名がnameなら落ちるテストです。

version: 2

models:
  - name: person
    description: |
      コレは人間です
    meta:
      owner: admin
      sensitive: true
      logical_name: 人間テーブル
      logical_path: /
    columns:
      - name: name
        description: 人間さんの名前です。
        meta:
          logical_name: 名前
          logical_path: /人間
        tests:
          - column_name_is_not_name
      - name: age
        description: |
          人間さんの年齢です
          10の位で切り捨ててます!!!!
        meta:
          logical_name: 年齢
          logical_path: /人間
        tests:
          - column_name_is_not_name
$ dbt build --select person
09:02:28  Running with dbt=1.3.1
09:02:28  Change detected to override macro used during parsing. Starting full parse.
09:02:29  model sql render
09:02:29  noop for "postgres"."dev"."numbers" in parse phase pre hook
09:02:29  noop for "postgres"."dev"."numbers" in parse phase before_begin pre hook
09:02:29  noop for "postgres"."dev"."numbers" in parse phase after_commit post hook
09:02:29  noop for "postgres"."dev"."numbers" in parse phase post hook
09:02:29  Found 5 models, 6 tests, 0 snapshots, 0 analyses, 299 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
09:02:29  
09:02:30  Concurrency: 4 threads (target='dev')
09:02:30  
09:02:30  1 of 3 START sql table model dev.person ........................................ [RUN]
09:02:30  1 of 3 OK created sql table model dev.person ................................... [SELECT 1 in 0.26s]
09:02:30  2 of 3 START test column_name_is_not_name_person_age ........................... [RUN]
09:02:30  3 of 3 START test column_name_is_not_name_person_name .......................... [RUN]
09:02:30  2 of 3 PASS column_name_is_not_name_person_age ................................. [PASS in 0.10s]
09:02:30  3 of 3 FAIL 1 column_name_is_not_name_person_name .............................. [FAIL 1 in 0.09s]
09:02:30  
09:02:30  Finished running 1 table model, 2 tests in 0 hours 0 minutes and 0.62 seconds (0.62s).
09:02:30  
09:02:30  Completed with 1 error and 0 warnings:
09:02:30  
09:02:30  Failure in test column_name_is_not_name_person_name (models/schema.yml)
09:02:30    Got 1 result, configured to fail if != 0
09:02:30  
09:02:30    compiled Code at target/compiled/macro_tips_advcal/models/schema.yml/column_name_is_not_name_person_name.sql
09:02:30  
09:02:30  Done. PASS=2 WARN=0 ERROR=1 SKIP=0 TOTAL=3

基本的にtestではSQLを書きます。そしてその結果が1行以上あれば、testが落ちます。

ちなみにですが、model自体のテストをする場合は、column_nameの引数はいりません。

tests/generic/test_model_name_is_not_person.sql
{% test model_name_is_not_person(model) %}
with validation as (
    select '{{ model }}' as model_name
),
validation_errors as (
    select
        model_name
    from validation
    where model_name = '{{ ref("person") }}'
)

select *
from validation_errors

{% endtest %}

さて、引数を取る場合はどうするのでしょう?

sql
{% test model_name_is_not(model, fail_name) %}
with validation as (
    select '{{ model }}' as model_name
),
validation_errors as (
    select
        model_name
    from validation
    where model_name = '{{ fail_name }}'
)

select *
from validation_errors

{% endtest %}
tests/generic/test_column_name_is_not.sql
{% test column_name_is_not(model, column_name, fail_name) %}

with validation as (
    select '{{ column_name }}' as column_name
),
validation_errors as (
    select
        column_name
    from validation
    where column_name = '{{ fail_name }}'
)

select *
from validation_errors

{% endtest %}
version: 2

models:
  - name: person
    description: |
      コレは人間です
    meta:
      owner: admin
      sensitive: true
      logical_name: 人間テーブル
      logical_path: /
    tests:
      - model_name_is_not:
          fail_name: ref('person')
    columns:
      - name: name
        description: 人間さんの名前です。
        meta:
          logical_name: 名前
          logical_path: /人間
        tests:
          - column_name_is_not:
              fail_name: age
      - name: age
        description: |
          人間さんの年齢です
          10の位で切り捨ててます!!!!
        meta:
          logical_name: 年齢
          logical_path: /人間
        tests:
          - column_name_is_not:
              fail_name: age
$ dbt build --select person
09:19:00  Running with dbt=1.3.1
09:19:01  Found 5 models, 7 tests, 0 snapshots, 0 analyses, 300 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
09:19:01  
09:19:01  Concurrency: 4 threads (target='dev')
09:19:01  
09:19:01  1 of 4 START sql table model dev.person ........................................ [RUN]
09:19:01  1 of 4 OK created sql table model dev.person ................................... [SELECT 1 in 0.22s]
09:19:01  2 of 4 START test column_name_is_not_person_age__age ........................... [RUN]
09:19:01  3 of 4 START test column_name_is_not_person_name__age .......................... [RUN]
09:19:01  4 of 4 START test model_name_is_not_person_ref_person_ ......................... [RUN]
09:19:01  3 of 4 PASS column_name_is_not_person_name__age ................................ [PASS in 0.11s]
09:19:01  4 of 4 FAIL 1 model_name_is_not_person_ref_person_ ............................. [FAIL 1 in 0.12s]
09:19:01  2 of 4 FAIL 1 column_name_is_not_person_age__age ............................... [FAIL 1 in 0.14s]
09:19:01  
09:19:01  Finished running 1 table model, 3 tests in 0 hours 0 minutes and 0.66 seconds (0.66s).
09:19:01  
09:19:01  Completed with 2 errors and 0 warnings:
09:19:01  
09:19:01  Failure in test model_name_is_not_person_ref_person_ (models/schema.yml)
09:19:01    Got 1 result, configured to fail if != 0
09:19:01  
09:19:01    compiled Code at target/compiled/macro_tips_advcal/models/schema.yml/model_name_is_not_person_ref_person_.sql
09:19:01  
09:19:01  Failure in test column_name_is_not_person_age__age (models/schema.yml)
09:19:01    Got 1 result, configured to fail if != 0
09:19:01  
09:19:01    compiled Code at target/compiled/macro_tips_advcal/models/schema.yml/column_name_is_not_person_age__age.sql
09:19:01  
09:19:01  Done. PASS=2 WARN=0 ERROR=2 SKIP=0 TOTAL=4

実際は、{{ model}} を使ってSELECTしますが、こんな感じの例で、どうやってテストを書くのか雰囲気がわかったでしょうか?


21日目からは、実際にもうちょっと複雑なmacroを作り込んでいく例をあげていこうと思います。

Discussion