🕌
Snowflake x dbtやってみた〜スナップショット編〜
今回はdbtの機能の一つの「スナップショット」をみていきます。
文字からしてバックアップ機能な感じですね。
Snowflakeにタイムトラベルといったテーブルの状態を戻す復元機能がありましたが、
dbtにもそのような機能なのでしょうか。。。
スナップショット用のテーブルを作成
{% snapshot scd_raw_listings %}
{{
config(
target_schema='dev',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True
)
}}
select * FROM {{ source('airbnb', 'listings') }}
{% endsnapshot %}
dbt snapshotコマンドの実行
dbt snapshot
テーブルの確認
sql自体はAIRBNB.RAW.RAW_LISTINGSテーブルをSELECTしているので同じテーブル構造になりそうですね。
まずは元々のテーブル構造を確認してみます
AIRBNB.RAW.RAW_LISTINGS
xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * FROM airbnb.raw.raw_listings limit 5;
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+
| ID | LISTING_URL | NAME | ROOM_TYPE | MINIMUM_NIGHTS | HOST_ID | PRICE | CREATED_AT | UPDATED_AT |
|-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt | 62 | 3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 |
| 7071 | https://www.airbnb.com/rooms/7071 | BrightRoom with sunny greenview! | Private room | 1 | 17391 | $33.00 | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 |
| 9991 | https://www.airbnb.com/rooms/9991 | Geourgeous flat - outstanding views | Entire home/apt | 0 | 33852 | $180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 |
| 14325 | https://www.airbnb.com/rooms/14325 | Apartment in Prenzlauer Berg | Entire home/apt | 95 | 55531 | $70.00 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 |
| 16644 | https://www.airbnb.com/rooms/16644 | In the Heart of Berlin - Kreuzberg | Entire home/apt | 60 | 64696 | $90.00 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 |
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+
5 Row(s) produced. Time Elapsed: 0.863s
xxxx#COMPUTE_WH@(no database).(no schema)>
次にscd_raw_listingsテーブルを確認してみます
SCD.RAW.LISTINGS
xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * FROM airbnb.dev.scd_raw_listings limit 5;
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
| ID | LISTING_URL | NAME | ROOM_TYPE | MINIMUM_NIGHTS | HOST_ID | PRICE | CREATED_AT | UPDATED_AT | DBT_SCD_ID | DBT_UPDATED_AT | DBT_VALID_FROM | DBT_VALID_TO |
|-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt | 62 | 3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | NULL |
| 7071 | https://www.airbnb.com/rooms/7071 | BrightRoom with sunny greenview! | Private room | 1 | 17391 | $33.00 | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 | c2a3175b0e715a7ed8762c1c46fb3e6c | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 | NULL |
| 9991 | https://www.airbnb.com/rooms/9991 | Geourgeous flat - outstanding views | Entire home/apt | 0 | 33852 | $180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 | f9b718057a795e92cbb1c19ebfa79ca1 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 | NULL |
| 14325 | https://www.airbnb.com/rooms/14325 | Apartment in Prenzlauer Berg | Entire home/apt | 95 | 55531 | $70.00 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 | fb48f80a251584a5a52026d12f4b7a22 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 | NULL |
| 16644 | https://www.airbnb.com/rooms/16644 | In the Heart of Berlin - Kreuzberg | Entire home/apt | 60 | 64696 | $90.00 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 | 71c6f379f3e401b0df63cd54e8f129b1 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 | NULL |
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
5 Row(s) produced. Time Elapsed: 1.762s
xxxx#COMPUTE_WH@(no database).(no schema)>
二つのテーブルの違い
SCD.RAW.LISTINGSテーブルは以下のカラムが追加されていることが分かります。
スナップショット機能に必要なカラムっぽいですね。。。
- DBT_SCD_ID
- DBT_UPDATED_AT
- DBT_VALID_FROM
- DBT_VALID_TO
それではテーブル(AIRBNB.RAW.RAW_LISTINGS)を更新してみます
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>UPDATE AIRBNB.RAW.RAW_LISTINGS SET MINIMUM_NIGHTS=30 ,updated_at=CURRENT_TIMESTAMP() WHERE ID=3176;
+------------------------+-------------------------------------+
| number of rows updated | number of multi-joined rows updated |
|------------------------+-------------------------------------|
| 1 | 0 |
+------------------------+-------------------------------------+
1 Row(s) produced. Time Elapsed: 1.437s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>
スナップショットテーブルのデータを確認してみます
MINIMUM_NIGHTSの値が変わっていません。
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>SELECT * FROM AIRBNB.DEV.SCD_RAW_LISTINGS WHERE ID=3176;
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
| ID | LISTING_URL | NAME | ROOM_TYPE | MINIMUM_NIGHTS | HOST_ID | PRICE | CREATED_AT | UPDATED_AT | DBT_SCD_ID | DBT_UPDATED_AT | DBT_VALID_FROM | DBT_VALID_TO |
|------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt | 62 | 3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | NULL |
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
1 Row(s) produced. Time Elapsed: 1.470s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>
dbt snapshotコマンドの実行
dbt snapshot
再度スナップショットテーブルを確認してみます。
2つのレコードを確認できました。
1行目は先ほどUPDATE句で追加した値で、2行目が元々の値です。
DBT_VALID_TOを確認すると、2行目に日付が入っています。
これは値が更新された日付です。
ちなみにDBT_VALID_TOの値がNULLが最新のレコードを指します。
snapshotでUPDATE句を実行すると、レコードの書き換えではなくINSERT句となりレコードが追記されます。
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>select * from AIRBNB.DEV.SCD_RAW_LISTINGS WHERE ID=3176;
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------+
| ID | LISTING_URL | NAME | ROOM_TYPE | MINIMUM_NIGHTS | HOST_ID | PRICE | CREATED_AT | UPDATED_AT | DBT_SCD_ID | DBT_UPDATED_AT | DBT_VALID_FROM | DBT_VALID_TO |
|------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt | 30 | 3718 | $90.00 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 | 2b3b5bc288236391b7dc918fe2cb1efa | 2022-12-03 03:43:43.206 | 2022-12-03 03:43:43.206 | NULL |
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt | 62 | 3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 |
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------+
2 Row(s) produced. Time Elapsed: 0.674s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>
Discussion