💯

dbt-unit-testing を使ってモデルのロジックテストを実装する

2023/02/24に公開

Test

TDD

dbt

tech

dbt のテストというとデータそのものの品質などをテストするのが主な目的で、モデルのロジックそのものをテストするにはあまり適していなかったりします。
ロジックテストを良しなにできないか調査したところ dbt-unit-testing というパッケージが良さげだな～と思ったので今回はこちらを触ってみます。

TL;DR

dbt-unit-testing はモデルのロジックテストに使えるパッケージ
依存モデルにモックデータを仕込むことができる
モックデータは select 文または表形式で定義可能
テスト結果が表形式でわかりやすい
モデルの結合テストにも対応

使ってみる

dbt のサンプルプロジェクトを使ってユニットテストを実装してみます。

https://github.com/dbt-labs/jaffle_shop

初期設定

packages.yml に以下を追記し、dbt deps を実行します。

packages:
  - git: "https://github.com/EqualExperts/dbt-unit-testing"
    revision: v0.2.6

テスト実行時にモックデータを読み込めるよう、ビルトインマクロ ref と source をパッケージ独自のマクロにオーバーライドします。macros/macros.sql を次のように作成します。
テストに特定のタグが含まれている場合、モックを呼びだすようにする処理が追加されているようです（source）。

{% macro ref(model_name) %}
    {{ return(dbt_unit_testing.ref(model_name)) }}
{% endmacro %}

{% macro source(source, model_name) %}
    {{ return(dbt_unit_testing.source(source, model_name)) }}
{% endmacro %}

異なるスキーマやデータベースに含まれるテスト対象（もしくは依存する）モデルと同名のテーブルを呼びだすとエラーとなる場合があります。そこで完全修飾名で呼びだすよう dbt_project.yml に設定を追加しておきます。

vars:
  unit_tests_config:
    use_qualified_sources: true

ユニットテスト

実際にユニットテストを作ってみます（といっても公式 doc の example をそのまま引っ張ってきたものですが）。
サンプルの customers モデルに対するテストです。

{{ config(tags=['unit-test']) }}

{% call dbt_unit_testing.test('customers', 'should sum order values to calculate customer_lifetime_value') %}

  {% call dbt_unit_testing.mock_ref ('stg_customers') %}
    select 1 as customer_id, '' as first_name, '' as last_name
  {% endcall %}

  {% call dbt_unit_testing.mock_ref ('stg_orders') %}
    select 1001 as order_id, 1 as customer_id, null as order_date
    UNION ALL
    select 1002 as order_id, 1 as customer_id, null as order_date
  {% endcall %}

  {% call dbt_unit_testing.mock_ref ('stg_payments') %}
    select 1001 as order_id, 10 as amount
    UNION ALL
    select 1002 as order_id, 10 as amount
  {% endcall %}

  {% call dbt_unit_testing.expect() %}
    select 1 as customer_id, 20 as customer_lifetime_value
  {% endcall %}
{% endcall %}

上記で用いられているマクロをそれぞれ紹介します。

まずテストケースは dbt_unit_testing.test マクロを使って定義します。第 1 引数にテスト対象となるモデル名、第 2 引数にテスト内容をそれぞれ入力します。

{% call dbt_unit_testing.test('customers', 'should sum order values to calculate customer_lifetime_value') %}

test 内にモックデータ（mock_ref または mock_source）および入力データに対する出力期待値（expect）を select 文で定義しています。

-- customers が依存するモデル(stg_customers)に対するモックデータを記載
  {% call dbt_unit_testing.mock_ref ('stg_customers') %}
    select 1 as customer_id, '' as first_name, '' as last_name
  {% endcall %}

-- モデルの期待値
  {% call dbt_unit_testing.expect() %}
    select 1 as customer_id, 20 as customer_lifetime_value
  {% endcall %}

dbt なので for ループを使って大量データを仕込むこともできたりしますね。

  {% call dbt_unit_testing.mock_ref ('stg_payments') %}
    {% for i in range(1, 101) %}
      select 1000 + {{i}} as order_id, 10 as amount
      {% if not loop.last %} union all {% endif %}
    {% endfor %}
  {% endcall %}

またモックデータは select 文だけでなく csv などの表フォーマットとして入力可能です。
この場合は mock_ref の第 2 引数に option として "input_format": "csv" を追加します。
お好みで行・列のセパレーター文字列の変更もできます（doc 参照）。

{% call dbt_unit_testing.mock_ref ('stg_customers', {"input_format": "csv"} ) %}
    customer_id,first_name,last_name
    1,'',''
{% endcall %}

テストケースを追加する場合は、互いのケースを union all で結合できます。ケースごとにファイルを分ける必要がなくてすてき。

{% call dbt_unit_testing.test('customers', 'test1') %}
    ...
{% endcall %}

union all

{% call dbt_unit_testing.test('customers', 'test2') %}
    ...
{% endcall %}

テストの実行は普段通り dbt test を叩くだけで OK です。
ユニットテストのみを実行したい場合は -m オプションを使ってタグで絞り込みます。

dbt test -m tag:unit-test
14:49:16  Running with dbt=1.4.1
14:49:17  Found 6 models, 21 tests, 0 snapshots, 0 analyses, 474 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
14:49:17
14:49:17  Concurrency: 4 threads (target='dev')
14:49:17
14:49:17  1 of 1 START test unit_tests ................................................... [RUN]
14:49:19  1 of 1 PASS unit_tests ......................................................... [PASS in 1.84s]
14:49:19
14:49:19  Finished running 1 test in 0 hours 0 minutes and 2.67 seconds (2.67s).
14:49:19
14:49:19  Completed successfully
14:49:19
14:49:19  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

テストに失敗した場合、失敗したケースに対する期待値・実際の値それぞれの diff が表で出力されます。

15:52:50  1 of 1 START test unit_tests ................................................... [RUN]
15:52:53  MODEL: model_name
15:52:53  TEST:  should sum order values to calculate customer_lifetime_value
15:52:53  ERROR: Rows mismatch:
15:52:53  | DIFF | COUNT | CUSTOMER_ID | CUSTOMER_LIFETIME_VALUE |
15:52:53  | ---- | ----- | ----------- | ----------------------- |
15:52:53  | +    |     1 |           1 |               10.000000 |
15:52:53  | -    |     1 |           1 |               20.000000 |

ユニットテストが実行されるたびにテスト向けでコンパイルされたクエリが発行されるため、事前に dbt run でテスト対象のモデルを作っておく必要は原則ありません。
後述する例外もあるものの、テスト結果のフィードバックが手早く得られるのは嬉しいですね。

モデルの一部カラムのみをモックする

モック対象のモデルのカラムが多く、一部のみ置き換えたいといったケースにも対応しています。
mock_ref マクロのオプションに include_missing_columns を true にセットします。
ただしこのケースでは省略したカラムのデータをデプロイ先のモデル（またはソース）から類推してモックを生成するため、事前に dbt run でデータを DB 上に持たせておく必要があります。

{% call dbt_unit_testing.mock_ref ('stg_customers', {"include_missing_columns": "true"} ) %}
    select 1 as customer_id
{% endcall %}

結合テスト

dbt-unit-testing ではモデルの依存関係が何層にも重なっているケースにおいても有用です。
たとえば staging -> intermediate -> mart のような 3 層構造にて mart 層のテストを作成する際、間接的に依存する staging 層にのみモックができます。つまり、intermediate 層のロジック検証を含めた結合テストが実装できます。
intermediate 層をリファクタリングしてモデルが増えたり減ったりしてもテストできるので楽ですね。

integration-test

さいごに

dbt のロジックテストに使えそうなパッケージ dbt-unit-testing を紹介しました。
マクロを使って複数のテストケースをひとつのファイルにまとめられたり、テスト結果も表形式でわかりやすいのが魅力です。
テスト実行は dbt test を呼びだすだけなので、CI に組み込みやすくチーム開発にも役立ちそうな気がします！