🕌

Snowflake x dbtやってみた〜スナップショット編〜

2022/12/10に公開

今回はdbtの機能の一つの「スナップショット」をみていきます。
文字からしてバックアップ機能な感じですね。
Snowflakeにタイムトラベルといったテーブルの状態を戻す復元機能がありましたが、
dbtにもそのような機能なのでしょうか。。。

スナップショット用のテーブルを作成

{% snapshot scd_raw_listings %}

{{
   config(
       target_schema='dev',
       unique_key='id',
       strategy='timestamp',
       updated_at='updated_at',
       invalidate_hard_deletes=True
   )
}}

select * FROM {{ source('airbnb', 'listings') }}

{% endsnapshot %}

dbt snapshotコマンドの実行

dbt snapshot

テーブルの確認

sql自体はAIRBNB.RAW.RAW_LISTINGSテーブルをSELECTしているので同じテーブル構造になりそうですね。

まずは元々のテーブル構造を確認してみます

AIRBNB.RAW.RAW_LISTINGS
xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * FROM airbnb.raw.raw_listings limit 5;
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+
|    ID | LISTING_URL                        | NAME                                | ROOM_TYPE       | MINIMUM_NIGHTS | HOST_ID | PRICE   | CREATED_AT              | UPDATED_AT              |
|-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------|
|  3176 | https://www.airbnb.com/rooms/3176  | Fabulous Flat in great Location     | Entire home/apt |             62 |    3718 | $90.00  | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 |
|  7071 | https://www.airbnb.com/rooms/7071  | BrightRoom with sunny greenview!    | Private room    |              1 |   17391 | $33.00  | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 |
|  9991 | https://www.airbnb.com/rooms/9991  | Geourgeous flat - outstanding views | Entire home/apt |              0 |   33852 | $180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 |
| 14325 | https://www.airbnb.com/rooms/14325 | Apartment in Prenzlauer Berg        | Entire home/apt |             95 |   55531 | $70.00  | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 |
| 16644 | https://www.airbnb.com/rooms/16644 | In the Heart of Berlin - Kreuzberg  | Entire home/apt |             60 |   64696 | $90.00  | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 |
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+
5 Row(s) produced. Time Elapsed: 0.863s
xxxx#COMPUTE_WH@(no database).(no schema)>

次にscd_raw_listingsテーブルを確認してみます

SCD.RAW.LISTINGS
xxxx#COMPUTE_WH@(no database).(no schema)>SELECT * FROM airbnb.dev.scd_raw_listings limit 5;
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
|    ID | LISTING_URL                        | NAME                                | ROOM_TYPE       | MINIMUM_NIGHTS | HOST_ID | PRICE   | CREATED_AT              | UPDATED_AT              | DBT_SCD_ID                       | DBT_UPDATED_AT          | DBT_VALID_FROM          | DBT_VALID_TO |
|-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------|
|  3176 | https://www.airbnb.com/rooms/3176  | Fabulous Flat in great Location     | Entire home/apt |             62 |    3718 | $90.00  | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | NULL         |
|  7071 | https://www.airbnb.com/rooms/7071  | BrightRoom with sunny greenview!    | Private room    |              1 |   17391 | $33.00  | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 | c2a3175b0e715a7ed8762c1c46fb3e6c | 2009-08-12 12:30:30.000 | 2009-08-12 12:30:30.000 | NULL         |
|  9991 | https://www.airbnb.com/rooms/9991  | Geourgeous flat - outstanding views | Entire home/apt |              0 |   33852 | $180.00 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 | f9b718057a795e92cbb1c19ebfa79ca1 | 2015-07-30 05:08:52.000 | 2015-07-30 05:08:52.000 | NULL         |
| 14325 | https://www.airbnb.com/rooms/14325 | Apartment in Prenzlauer Berg        | Entire home/apt |             95 |   55531 | $70.00  | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 | fb48f80a251584a5a52026d12f4b7a22 | 2010-06-15 19:56:01.000 | 2010-06-15 19:56:01.000 | NULL         |
| 16644 | https://www.airbnb.com/rooms/16644 | In the Heart of Berlin - Kreuzberg  | Entire home/apt |             60 |   64696 | $90.00  | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 | 71c6f379f3e401b0df63cd54e8f129b1 | 2010-05-30 12:11:33.000 | 2010-05-30 12:11:33.000 | NULL         |
+-------+------------------------------------+-------------------------------------+-----------------+----------------+---------+---------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
5 Row(s) produced. Time Elapsed: 1.762s
xxxx#COMPUTE_WH@(no database).(no schema)>

二つのテーブルの違い

SCD.RAW.LISTINGSテーブルは以下のカラムが追加されていることが分かります。
スナップショット機能に必要なカラムっぽいですね。。。

  • DBT_SCD_ID
  • DBT_UPDATED_AT
  • DBT_VALID_FROM
  • DBT_VALID_TO

それではテーブル(AIRBNB.RAW.RAW_LISTINGS)を更新してみます

xxxx#COMPUTE_WH@AIRBNB.PUBLIC>UPDATE AIRBNB.RAW.RAW_LISTINGS SET MINIMUM_NIGHTS=30 ,updated_at=CURRENT_TIMESTAMP() WHERE ID=3176;
+------------------------+-------------------------------------+                
| number of rows updated | number of multi-joined rows updated |
|------------------------+-------------------------------------|
|                      1 |                                   0 |
+------------------------+-------------------------------------+
1 Row(s) produced. Time Elapsed: 1.437s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>

スナップショットテーブルのデータを確認してみます

MINIMUM_NIGHTSの値が変わっていません。

xxxx#COMPUTE_WH@AIRBNB.PUBLIC>SELECT * FROM AIRBNB.DEV.SCD_RAW_LISTINGS WHERE ID=3176;
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
|   ID | LISTING_URL                       | NAME                            | ROOM_TYPE       | MINIMUM_NIGHTS | HOST_ID | PRICE  | CREATED_AT              | UPDATED_AT              | DBT_SCD_ID                       | DBT_UPDATED_AT          | DBT_VALID_FROM          | DBT_VALID_TO |
|------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt |             62 |    3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | NULL         |
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+--------------+
1 Row(s) produced. Time Elapsed: 1.470s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>

dbt snapshotコマンドの実行

dbt snapshot

再度スナップショットテーブルを確認してみます。
2つのレコードを確認できました。
1行目は先ほどUPDATE句で追加した値で、2行目が元々の値です。
DBT_VALID_TOを確認すると、2行目に日付が入っています。
これは値が更新された日付です。
ちなみにDBT_VALID_TOの値がNULLが最新のレコードを指します。
snapshotでUPDATE句を実行すると、レコードの書き換えではなくINSERT句となりレコードが追記されます。

xxxx#COMPUTE_WH@AIRBNB.PUBLIC>select * from AIRBNB.DEV.SCD_RAW_LISTINGS WHERE ID=3176;
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------+
|   ID | LISTING_URL                       | NAME                            | ROOM_TYPE       | MINIMUM_NIGHTS | HOST_ID | PRICE  | CREATED_AT              | UPDATED_AT              | DBT_SCD_ID                       | DBT_UPDATED_AT          | DBT_VALID_FROM          | DBT_VALID_TO            |
|------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------|
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt |             30 |    3718 | $90.00 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 | 2b3b5bc288236391b7dc918fe2cb1efa | 2022-12-03 03:43:43.206 | 2022-12-03 03:43:43.206 | NULL                    |
| 3176 | https://www.airbnb.com/rooms/3176 | Fabulous Flat in great Location | Entire home/apt |             62 |    3718 | $90.00 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | c9e3bc0b5eb3a808ee31530eccdfa503 | 2009-06-05 21:34:42.000 | 2009-06-05 21:34:42.000 | 2022-12-03 03:43:43.206 |
+------+-----------------------------------+---------------------------------+-----------------+----------------+---------+--------+-------------------------+-------------------------+----------------------------------+-------------------------+-------------------------+-------------------------+
2 Row(s) produced. Time Elapsed: 0.674s
xxxx#COMPUTE_WH@AIRBNB.PUBLIC>

Discussion