🦁

【dbt Docs】Building a dbt Project - Tests

2022/03/14に公開

Tests

Test command
dbt testは、models, sources, snapshot, seedなど dbtのコマンドに対してテストを走らせる。基本的に、リソース（モデルやスナップショット）は構成済みという前提
--selectフラグで、特定の条件のテストができる

# run tests for one_specific_model
$ dbt test --select one_specific_model

# run tests for all models in package
$ dbt test --select some_package.*

# run only tests defined singularly
$ dbt test --select test_type:singular

# run only tests defined generically
$ dbt test --select test_type:generic

# run singular tests limited to one_specific_model
$ dbt test --select one_specific_model,test_type:singular

# run generic tests limited to one_specific_model
$ dbt test --select one_specific_model,test_type:generic

Test properties

version: 2

models:
  - name: <model_name>
    tests:
      - <test_name>:
          <argument_name>: <argument_value>
          config:
            <test_config>: <config-value>

    columns:
      - name: <column_name>
        tests:
          - <test_name>
          - <test_name>:
              <argument_name>: <argument_value>
              config:
                <test_config>: <config-value>

not_null
unique
accepted_values
relationships

schama.yml

version: 2

models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

Test configurations
Test selection examples

Getting started

（はじめに）

テストは、dbtプロジェクト内のモデルやその他のリソース（ソース、シード、スナップショットなど）について行うアサーションです。dbt testを実行すると、dbtは、プロジェクトの各テストが成功したか失敗したかを通知します。
※アサーションとは：あるコードが実行される時に満たされるべき条件を記述して実行時にチェックする仕組み

testは、SQLのクエリで、失敗するSQLを記載して、アサーションし、おかしなことが起きていないか？をチェックしています。他には、モデル内のユニーク制約やNull制約といったものをチェックする機構もあり、データのチェックが保証されます
※DWH製品は、構文上許容されている NOT NULLやPRIMARY KEYなどが実質動いていなかったりするが、そこを補完的にdbtにてデータのチェックを入れることで、「実質的に」PKey状態になっているか？などが検証できる。

dbtでテストを定義する方法は2つある

singular(特異な） test :失敗した行を返すSelect文を、testディレクトリ内の.sqlに配置する。 dbt testコマンドで実行できる
generic（一般的な） test : nullチェックとかPkeyチェックとか。ymlファイルに書いたり、.sqlのなかのtestブロックで定義したり。このテストは、モデルやカラム、ソースやスナップショット、シードでテストすることができる。

テストを定義することは、コードが正しく機能していることを確認するための優れた方法であり、コードが変更されたときのリグレッションを防ぐのに役立ちます。それらを何度も使用して、わずかなバリエーションで同様のアサーションを作成できるため、一般的なテストがはるかに一般的である傾向があります。これらは、dbtテストスイートの大部分を構成する必要があります

Singular tests（特異？テスト）

テストを定義する最も簡単な方法は、失敗したレコードを返す正確なSQLを作成することです。これらは単一の目的に使用できる1回限りのアサーションであるため、これらを「特異な（singular)」テストと呼びます。

これらのテストは、testsディレクトリ配下の .sqlファイルです。( test-paths configで定義することも可能）。モデルと同じように作り、テスト定義では、Jinja（　ref関数やsource関数など）が使用できる。.sqlファイルには一つのselect句が含まれ、1つのテストを定義する。

tests/assert_total_payment_amount_is_positive.sql

-- Refunds have a negative amount, so the total amount should always be >= 0.
-- Therefore return records where this isn't true to make the test fail
select
    order_id,
    sum(amount) as total_amount
from {{ ref('fct_payments' )}}
group by 1
having not(total_amount >= 0)

※カラム名などを変えるだけで使えるので、その意味では「特異」ではない。呼称的にはSingularだが、使い回せる。

Generic tests

（一般的なテスト）

これはnot_nullテストの中身。2つの引数がある。modelとcolum_name。こちらでテンプレート化されている。

not_null test

{% test not_null(model, column_name) %}

    select *
    from {{ model }}
    where {{ column_name }} is null

{% endtest %}

デフォルトでは、以下のテストが利用できる

unique
not_null
accepted_values
relationship

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'returned']
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

order_idは、uniqueとnot_null
statusは、place, shipped, completed, returnedの値のみOK
customer_idは、customers.idと同値（参照整合性）

More generic tests

（より一般的なテスト）

パッケージやマクロなどが公開されているのでそちらを使うのもあり（ dbt-utilsや　dbt-expectionsなど）

dbt-utils
- equal_rowcount : 比較対象のモデルと行数が一緒（JOINが・・とかのテストができそう）
- fewer_rows_than
- quality
- expression_is_true
- recency
- at_least_one
- not_constant
- cardinality_equaility
- unique_where
- not_null_where
- not_null_proportion : このテストは、列に存在する非NULL値の割合が、指定された範囲内であることを検証します。
- relationships_where
- mutually_exclusive_ranges:このテストは、与えられたlower_bound_columnとupper_bound_columnについて、下限と上限の間の範囲が他の行の範囲と重ならないことを確認するものである。
- unique_combination_of_columns
- acccceptged_range
dbt-expections
- Table Shape
- Missing values, unique values, and types
- Sets and ranges
- String matching
  - expect_column_value_lengths_to_equal
  - expect_column_values_to_match_regex
- Aggregated functions
  - expect_column_distinct_count_to_be_greater_than
  - expect_column_max_to_be_between
- Multi-column
  - expect_column_pair_values_A_to_be_greater_than_B
- Distributional functions

Example

テストをプロジェクトに追加するには、

'.yml'ファイルをmodelsディレクトリに作成する。例えば models/schema.yml

models/schama.yml

version: 2

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null

dbt test コマンドを実行する

$ dbt test

Found 3 models, 2 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 0 sources

17:31:05 | Concurrency: 1 threads (target='learn')
17:31:05 |
17:31:05 | 1 of 2 START test not_null_order_order_id.....................    [RUN]
17:31:06 | 1 of 2 PASS not_null_order_order_id...........................    [PASS in 0.99s]
17:31:06 | 2 of 2 START test unique_order_order_id.......................    [RUN]
17:31:07 | 2 of 2 PASS unique_order_order_id.............................    [PASS in 0.79s]
17:31:07 |
17:31:07 | Finished running 2 tests in 7.17s.

Completed successfully

Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

テスト結果を確認する
- dbt Cloud : "Details" タブをチェック
- dbt CLI : target/compiledディレクトリを確認

Unique test

Compiled SQL

select *
from (

    select
        order_id

    from analytics.orders
    where order_id is not null
    group by order_id
    having count(*) > 1

) validation_errors

Templatged SQL

from (

    select
        {{ column_name }}

    from {{ model }}
    where {{ column_name }} is not null
    group by {{ column_name }}
    having count(*) > 1

) validation_errors

Not null test

Compiled SQL

select *
from analytics.orders
where order_id is null

Templated SQL

select *
from {{ model }}
where {{ column_name }} is null

Storing test failures

--store-fallures　オプションを使うか、 configにstore_failuresを指定すると、テストクエリの結果をDWHのテーブルに保存する。デバッグに有用

テスト結果は、デフォルトでdbt_test__auditに保存 dbt_test__の接頭辞のついたテーブル。schemaを使えば指定のスキーマに保尊することができる。
テストの結果は、常に上書き（追記ではない）

FAQs

一度に1つのモデルをテストするにはどうすればよいですか？
--selectフラグで可能
```
dbt test --select customers
```
テストの1つが失敗しましたが、どうすればデバッグできますか？

dbt Cloudでは、[Details]
dbt CLIではtarget/compiled/schama_testsにコンパイル済みのSQLが記録されているのでそのSQLを実行して失敗内容を確認する
プロジェクトにどのようなテストを追加する必要がありますか？

全てのモデルで、主キー（PrimaryKey）テストをすることがおすすめ( uniqueと　not_null制約）
あとはsourcesを用いて、ソースデータのデータ整合性テストを行うこともおすすめ
テストはいつ実行する必要がありますか？

新しいコードを作成するとき（SQLを変更して既存のモデルを壊していないことを確認するため）、および本番環境で変換を実行するとき（ソースデータに関する仮定が引き続き有効であることを確認するため）にテストを実行する必要があります。
プロジェクトの`tests`ディレクトリ以外のディレクトリにテストを保存できますか？

デフォルトでは、単一のテストファイルがtestsディレクトリ（ただし、設定可能）
汎用のテスト定義が tests/genericかmacrosに配置が想定される

たとえば、generic testが、my_cool_test/generic/の場合は、singular　testはmy_cool_testに
ソースのみでテストを実行するにはどうすればよいですか？
```
$ dbt test --select souce:*
```
テスト失敗のしきい値を設定できますか？

v0.20.0以降でerror_ifとwarn_ifで設定できるようになった。

2つの列の一意性をテストできますか？

いくつかの方法がある

モデルに一意のキーを作成して、それをテストする方法

models.orders.sql

select
  country_code || '-' || order_id as surrogate_key,
  ...

models/orders.yml

version: 2

models:
  - name: orders
    columns:
      - name: surrogate_key
        tests:
          - unique

式（expression）　でテストする

models/orders.yml

version: 2

models:
  - name: orders
      tests:
      - unique:
          column_name: "(country_code || '-' || order_id)"

注意たしかこの方法はあまり推奨されていないという記述を見た覚えがある。サポートしてないDWHがあるとか・・・
3. dbt_utils.unique_combination_of_columnsを使う方法
パフォーマンスに優れるため、大きなデータセットには有効

models/orders.yml

version: 2

models:
  - name: orders
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns:
            - country_code
            - order_id

Tests

Related reference docs

Getting started

Singular tests（特異？テスト）

Generic tests

More generic tests

Example

Unique test

Not null test

Storing test failures

FAQs

Discussion