😽

Snowflake x dbtやってみた〜テスト編〜

2022/12/11に公開

dbtの機能の一つでもある「テスト」に触れてみたいと思います。
その名の通りdbtのリソース(sources, seeds, snapshots )に対して、テスト(品質チェック)を行う機能です。

テスト対象のテーブル確認

今回はDIM_LISTINGS_CLEANSEDテーブルをテスト対象とします。

xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * from AIRBNB.DEV.DIM_LISTINGS_CLEANSED LIMIT 10;
+------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+
| LISTING_ID | LISTING_NAME                                     | ROOM_TYPE       | MINIMUM_NIGHTS | HOST_ID |  PRICE | CREATED_AT              | UPDATED_AT              |
|------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------|
|       3176 | Fabulous Flat in great Location                  | Entire home/apt |             30 |    3718 |  90.00 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 |
|       7071 | BrightRoom with sunny greenview!                 | Private room    |              1 |   17391 |  33.00 | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 |
|       9991 | Geourgeous flat - outstanding views              | Entire home/apt |              1 |   33852 | 180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 |
|      14325 | Apartment in Prenzlauer Berg                     | Entire home/apt |             95 |   55531 |  70.00 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 |
|      16644 | In the Heart of Berlin - Kreuzberg               | Entire home/apt |             60 |   64696 |  90.00 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 |
|      17904 | Beautiful Kreuzberg studio - 3 months minimum    | Entire home/apt |             92 |   68997 |  47.00 | 2010-02-08 17:23:48.000 | 2010-02-08 17:23:48.000 |
|      20858 | Designer Loft in Berlin Mitte                    | Entire home/apt |              3 |   71331 | 169.00 | 2012-09-24 21:03:01.000 | 2012-09-24 21:03:01.000 |
|      21869 | Studio in the Heart of Kreuzberg                 | Entire home/apt |             60 |   64696 |  70.00 | 2010-09-08 11:45:56.000 | 2010-09-08 11:45:56.000 |
|      22438 | WOHNUNG IN BERLIN ★ MITTE                        | Entire home/apt |             90 |   86159 |  65.00 | 2011-03-31 17:17:58.000 | 2011-03-31 17:17:58.000 |
|      22677 | Prenzel garden with leafy terrace (quiet Guests) | Entire home/apt |              2 |   87357 | 120.00 | 2010-10-08 08:59:12.000 | 2010-10-08 08:59:12.000 |
+------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+
10 Row(s) produced. Time Elapsed: 1.341s
xxxx#COMPUTE_WH@(no database).(no schema)>

テストを実施する対象のカラム

  • listing_id
  • host_id
  • room_type

テスト内容

カラム名 テスト項目 テスト内容
listing_id unique
not_null
一意制約
not null制約
host_id not_null
relationships
not null制約
dim_hosts_cleansedテーブルのhost_idカラムとの関係テスト
room_type accepted_values valuesに記載した値以外が存在しないテスト

テストクエリの記述

models/schema.yml
version: 2

models:
  - name: dim_listings_cleansed
    columns:

     - name: listing_id
       tests:
         - unique
         - not_null

     - name: host_id
       tests:
         - not_null
         - relationships:
             to: ref('dim_hosts_cleansed')
             field: host_id

     - name: room_type
       tests:
         - accepted_values:
             values: ['Entire home/apt',
                      'Private room',
                      'Shared room',
                      'Hotel room']

テストの実行

テスト自体は問題なく行えました。
特段エラーも無く。

ubuntu@dbt:~/dbtlearn/models$ dbt test
03:22:50  Running with dbt=1.3.1
03:22:50  Found 8 models, 5 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
03:22:50  
03:22:52  Concurrency: 4 threads (target='dev')
03:22:52  
03:22:52  1 of 5 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room  [RUN]
03:22:52  2 of 5 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
03:22:52  3 of 5 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
03:22:52  4 of 5 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [RUN]
03:22:55  1 of 5 PASS accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room  [PASS in 3.40s]
03:22:55  5 of 5 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
03:22:55  3 of 5 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 3.74s]
03:22:55  4 of 5 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [PASS in 3.80s]
03:22:56  5 of 5 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 0.91s]
03:22:56  2 of 5 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 4.74s]
03:22:56  
03:22:56  Finished running 5 tests in 0 hours 0 minutes and 6.13 seconds (6.13s).
03:22:56  
03:22:56  Completed successfully
03:22:56  
03:22:56  Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5

故意的にエラーを起こしてみる

以下のvaluesの値をEntire home/aptEntire home/aptsに変更し、testを実施してみる。
Entire home/aptsのような値は存在しないのでエラーになるはず。

     - name: room_type
       tests:
         - accepted_values:
             values: ['Entire home/apts',

dbt testの実行

accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room (models/schema.yml)
ちゃんと設定した箇所でエラーが起きていることを確認できました。

06:49:57  Running with dbt=1.3.1
06:49:57  Found 8 models, 5 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
06:49:57  
06:49:59  Concurrency: 4 threads (target='dev')
06:49:59  
06:49:59  1 of 5 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room  [RUN]
06:49:59  2 of 5 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
06:49:59  3 of 5 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
06:49:59  4 of 5 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [RUN]
06:50:01  3 of 5 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 2.68s]
06:50:01  5 of 5 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
06:50:02  4 of 5 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [PASS in 2.70s]
06:50:02  2 of 5 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 3.17s]
06:50:02  5 of 5 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 1.01s]
06:50:03  1 of 5 FAIL 1 accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room  [FAIL 1 in 4.27s]
06:50:03  
06:50:03  Finished running 5 tests in 0 hours 0 minutes and 5.69 seconds (5.69s).
06:50:03  
06:50:03  Completed with 1 error and 0 warnings:
06:50:03  
06:50:03  Failure in test accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room (models/schema.yml)
06:50:03    Got 1 result, configured to fail if != 0
06:50:03  
06:50:03    compiled Code at target/compiled/dbtlearn/models/schema.yml/accepted_values_dim_listings_c_35759fb3d6170c8c9857c7075087b96d.sql
06:50:03  
06:50:03  Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5

失敗したテスト内容も確認できます。

ubuntu@dbt:~/dbtlearn$ cat target/compiled/dbtlearn/models/schema.yml/accepted_values_dim_listings_c_35759fb3d6170c8c9857c7075087b96d.sql

with all_values as (

    select
        room_type as value_field,
        count(*) as n_records

    from AIRBNB.dev.dim_listings_cleansed
    group by room_type

)

select *
from all_values
where value_field not in (
    'Entire home/apts','Private room','Shared room','Hotel room'
)

Singular Tests

単一テストの実施も可能です。
自分でクエリを用意し、そのクエリでテストします。

クエリの用意

testsディレクトリ配下に以下のSQLファイルを設置します。
minimum_nightsが1より小さい値が存在するかテストするクエリです。

dim_listings_minimum_nights.sql
SELECT
    *
FROM
    {{ ref('dim_listings_cleansed') }}
WHERE minimum_nights < 1
LIMIT 10

dbt testの実行

ubuntu@dbt:~/dbtlearn/tests$ dbt test
07:06:04  Running with dbt=1.3.1
07:06:04  Found 8 models, 6 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
07:06:04  
07:06:06  Concurrency: 4 threads (target='dev')
07:06:06  
07:06:06  1 of 6 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room  [RUN]
07:06:06  2 of 6 START test dim_listings_minimum_nights .................................. [RUN]
07:06:06  3 of 6 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
07:06:06  4 of 6 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
07:06:08  1 of 6 PASS accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room  [PASS in 2.57s]
07:06:08  5 of 6 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [RUN]
07:06:09  2 of 6 PASS dim_listings_minimum_nights ........................................ [PASS in 2.88s]
07:06:09  6 of 6 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
07:06:09  3 of 6 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 2.92s]
07:06:09  4 of 6 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 3.00s]
07:06:10  5 of 6 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_  [PASS in 1.45s]
07:06:10  6 of 6 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 1.22s]
07:06:10  
07:06:10  Finished running 6 tests in 0 hours 0 minutes and 5.41 seconds (5.41s).
07:06:10  
07:06:10  Completed successfully
07:06:10  
07:06:10  Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6

カスタムテスト

独自のテストを作成することもできます。

独自テストの用意

{% test positive_value(model, column_name) %}
SELECT
    *
FROM
    {{ model }}
WHERE
    {{ column_name}} < 1
{% endtest %}

models/schema.ymlにパラメータ設定

最終3行を追記

          〜略〜
     - name: room_type
       tests:
         - accepted_values:
             values: ['Entire home/apt',
                      'Private room',
                      'Shared room',
                      'Hotel room']
                      
     - name: minimum_nights
       tests:
         - positive_value

生成されたクエリ

SELECT
    *
FROM
    AIRBNB.dev.dim_listings_cleansed
WHERE
    minimum_nights < 1;

このように独自のテストを作成し、テストすることが可能です。

Discussion