😽
Snowflake x dbtやってみた〜テスト編〜
dbtの機能の一つでもある「テスト」に触れてみたいと思います。
その名の通りdbtのリソース(sources, seeds, snapshots )に対して、テスト(品質チェック)を行う機能です。
テスト対象のテーブル確認
今回はDIM_LISTINGS_CLEANSEDテーブルをテスト対象とします。
xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * from AIRBNB.DEV.DIM_LISTINGS_CLEANSED LIMIT 10;
+------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+
| LISTING_ID | LISTING_NAME | ROOM_TYPE | MINIMUM_NIGHTS | HOST_ID | PRICE | CREATED_AT | UPDATED_AT |
|------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------|
| 3176 | Fabulous Flat in great Location | Entire home/apt | 30 | 3718 | 90.00 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 |
| 7071 | BrightRoom with sunny greenview! | Private room | 1 | 17391 | 33.00 | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 |
| 9991 | Geourgeous flat - outstanding views | Entire home/apt | 1 | 33852 | 180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 |
| 14325 | Apartment in Prenzlauer Berg | Entire home/apt | 95 | 55531 | 70.00 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 |
| 16644 | In the Heart of Berlin - Kreuzberg | Entire home/apt | 60 | 64696 | 90.00 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 |
| 17904 | Beautiful Kreuzberg studio - 3 months minimum | Entire home/apt | 92 | 68997 | 47.00 | 2010-02-08 17:23:48.000 | 2010-02-08 17:23:48.000 |
| 20858 | Designer Loft in Berlin Mitte | Entire home/apt | 3 | 71331 | 169.00 | 2012-09-24 21:03:01.000 | 2012-09-24 21:03:01.000 |
| 21869 | Studio in the Heart of Kreuzberg | Entire home/apt | 60 | 64696 | 70.00 | 2010-09-08 11:45:56.000 | 2010-09-08 11:45:56.000 |
| 22438 | WOHNUNG IN BERLIN ★ MITTE | Entire home/apt | 90 | 86159 | 65.00 | 2011-03-31 17:17:58.000 | 2011-03-31 17:17:58.000 |
| 22677 | Prenzel garden with leafy terrace (quiet Guests) | Entire home/apt | 2 | 87357 | 120.00 | 2010-10-08 08:59:12.000 | 2010-10-08 08:59:12.000 |
+------------+--------------------------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+
10 Row(s) produced. Time Elapsed: 1.341s
xxxx#COMPUTE_WH@(no database).(no schema)>
テストを実施する対象のカラム
- listing_id
- host_id
- room_type
テスト内容
カラム名 | テスト項目 | テスト内容 |
---|---|---|
listing_id | unique not_null |
一意制約 not null制約 |
host_id | not_null relationships |
not null制約 dim_hosts_cleansedテーブルのhost_idカラムとの関係テスト |
room_type | accepted_values | valuesに記載した値以外が存在しないテスト |
テストクエリの記述
models/schema.yml
version: 2
models:
- name: dim_listings_cleansed
columns:
- name: listing_id
tests:
- unique
- not_null
- name: host_id
tests:
- not_null
- relationships:
to: ref('dim_hosts_cleansed')
field: host_id
- name: room_type
tests:
- accepted_values:
values: ['Entire home/apt',
'Private room',
'Shared room',
'Hotel room']
テストの実行
テスト自体は問題なく行えました。
特段エラーも無く。
ubuntu@dbt:~/dbtlearn/models$ dbt test
03:22:50 Running with dbt=1.3.1
03:22:50 Found 8 models, 5 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
03:22:50
03:22:52 Concurrency: 4 threads (target='dev')
03:22:52
03:22:52 1 of 5 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room [RUN]
03:22:52 2 of 5 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
03:22:52 3 of 5 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
03:22:52 4 of 5 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [RUN]
03:22:55 1 of 5 PASS accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room [PASS in 3.40s]
03:22:55 5 of 5 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
03:22:55 3 of 5 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 3.74s]
03:22:55 4 of 5 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [PASS in 3.80s]
03:22:56 5 of 5 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 0.91s]
03:22:56 2 of 5 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 4.74s]
03:22:56
03:22:56 Finished running 5 tests in 0 hours 0 minutes and 6.13 seconds (6.13s).
03:22:56
03:22:56 Completed successfully
03:22:56
03:22:56 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
故意的にエラーを起こしてみる
以下のvaluesの値をEntire home/apt → Entire home/aptsに変更し、testを実施してみる。
Entire home/aptsのような値は存在しないのでエラーになるはず。
- name: room_type
tests:
- accepted_values:
values: ['Entire home/apts',
dbt testの実行
accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room (models/schema.yml)
ちゃんと設定した箇所でエラーが起きていることを確認できました。
06:49:57 Running with dbt=1.3.1
06:49:57 Found 8 models, 5 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
06:49:57
06:49:59 Concurrency: 4 threads (target='dev')
06:49:59
06:49:59 1 of 5 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room [RUN]
06:49:59 2 of 5 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
06:49:59 3 of 5 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
06:49:59 4 of 5 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [RUN]
06:50:01 3 of 5 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 2.68s]
06:50:01 5 of 5 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
06:50:02 4 of 5 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [PASS in 2.70s]
06:50:02 2 of 5 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 3.17s]
06:50:02 5 of 5 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 1.01s]
06:50:03 1 of 5 FAIL 1 accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room [FAIL 1 in 4.27s]
06:50:03
06:50:03 Finished running 5 tests in 0 hours 0 minutes and 5.69 seconds (5.69s).
06:50:03
06:50:03 Completed with 1 error and 0 warnings:
06:50:03
06:50:03 Failure in test accepted_values_dim_listings_cleansed_room_type__Entire_home_apts__Private_room__Shared_room__Hotel_room (models/schema.yml)
06:50:03 Got 1 result, configured to fail if != 0
06:50:03
06:50:03 compiled Code at target/compiled/dbtlearn/models/schema.yml/accepted_values_dim_listings_c_35759fb3d6170c8c9857c7075087b96d.sql
06:50:03
06:50:03 Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5
失敗したテスト内容も確認できます。
ubuntu@dbt:~/dbtlearn$ cat target/compiled/dbtlearn/models/schema.yml/accepted_values_dim_listings_c_35759fb3d6170c8c9857c7075087b96d.sql
with all_values as (
select
room_type as value_field,
count(*) as n_records
from AIRBNB.dev.dim_listings_cleansed
group by room_type
)
select *
from all_values
where value_field not in (
'Entire home/apts','Private room','Shared room','Hotel room'
)
Singular Tests
単一テストの実施も可能です。
自分でクエリを用意し、そのクエリでテストします。
クエリの用意
testsディレクトリ配下に以下のSQLファイルを設置します。
minimum_nightsが1より小さい値が存在するかテストするクエリです。
dim_listings_minimum_nights.sql
SELECT
*
FROM
{{ ref('dim_listings_cleansed') }}
WHERE minimum_nights < 1
LIMIT 10
dbt testの実行
ubuntu@dbt:~/dbtlearn/tests$ dbt test
07:06:04 Running with dbt=1.3.1
07:06:04 Found 8 models, 6 tests, 1 snapshot, 0 analyses, 303 macros, 0 operations, 1 seed file, 3 sources, 0 exposures, 0 metrics
07:06:04
07:06:06 Concurrency: 4 threads (target='dev')
07:06:06
07:06:06 1 of 6 START test accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room [RUN]
07:06:06 2 of 6 START test dim_listings_minimum_nights .................................. [RUN]
07:06:06 3 of 6 START test not_null_dim_listings_cleansed_host_id ....................... [RUN]
07:06:06 4 of 6 START test not_null_dim_listings_cleansed_listing_id .................... [RUN]
07:06:08 1 of 6 PASS accepted_values_dim_listings_cleansed_room_type__Entire_home_apt__Private_room__Shared_room__Hotel_room [PASS in 2.57s]
07:06:08 5 of 6 START test relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [RUN]
07:06:09 2 of 6 PASS dim_listings_minimum_nights ........................................ [PASS in 2.88s]
07:06:09 6 of 6 START test unique_dim_listings_cleansed_listing_id ...................... [RUN]
07:06:09 3 of 6 PASS not_null_dim_listings_cleansed_host_id ............................. [PASS in 2.92s]
07:06:09 4 of 6 PASS not_null_dim_listings_cleansed_listing_id .......................... [PASS in 3.00s]
07:06:10 5 of 6 PASS relationships_dim_listings_cleansed_host_id__host_id__ref_dim_hosts_cleansed_ [PASS in 1.45s]
07:06:10 6 of 6 PASS unique_dim_listings_cleansed_listing_id ............................ [PASS in 1.22s]
07:06:10
07:06:10 Finished running 6 tests in 0 hours 0 minutes and 5.41 seconds (5.41s).
07:06:10
07:06:10 Completed successfully
07:06:10
07:06:10 Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6
カスタムテスト
独自のテストを作成することもできます。
独自テストの用意
{% test positive_value(model, column_name) %}
SELECT
*
FROM
{{ model }}
WHERE
{{ column_name}} < 1
{% endtest %}
models/schema.ymlにパラメータ設定
最終3行を追記
〜略〜
- name: room_type
tests:
- accepted_values:
values: ['Entire home/apt',
'Private room',
'Shared room',
'Hotel room']
- name: minimum_nights
tests:
- positive_value
生成されたクエリ
SELECT
*
FROM
AIRBNB.dev.dim_listings_cleansed
WHERE
minimum_nights < 1;
このように独自のテストを作成し、テストすることが可能です。
Discussion