🎉

【DBT x Bigquery】開発環境と本番環境のGCPプロジェクトを分ける方法

2023/03/05に公開

概要

DBTでモデルをデプロイするときに、「DBT推奨しているように、個人ごとのデータセットを作りたい」「一方でデータセットがおおくなりすぎて見通しが悪くなる」というトレードオフにぶち当たりました。
そんな人のために、デプロイ先ののプロジェクト・データセットをいい感じにする設定について記載。

「環境・プロジェクト名・データセット名」の関係性について

以下のよう斜め絵になるように設定している。

Environment	database(BQのプロジェクト)	schema(BQのデータセット)
dev	<プロジェクト名>-sandbox	[ユーザ名]__[カスタムスキーマ名]
prod	<プロジェクト名>	[カスタムスキーマ名]

⚠️1：ユーザ名はdbt cloud上で自動に設定されるもの。(後述)

⚠️2：カスタムスキーマ名はdbt_project.ymlでディレクトリ毎に設定される。(後述)

プロジェクト名：sample-dmp
ユーザ名：dbt_yyamasaki
例えば、dbt_yyamasakiというユーザ名のアカウントでmodels/marts/marketing/customers.sqlというモデルをdbt runした場合は表のようになります。

Environment	database(BQのプロジェクト)	schema(BQのデータセット)
dev	sample-dmp-sandbox	dbt_yyamasaki__mart_marketing
prod	sample-dmp	mart_marketing

注目すべきポイントは2つ。

1 : dev環境において、ユーザ毎にscheme(データセット)が作成される。

理由：複数人が開発している中で、他の人のデプロイが影響を及ぼさないようにしている。
2 : dev環境のモデルは、sample-dmp-sandboxという別database(プロジェクト)に全て生成される。

設定方法

ファイルの設定

以下のファイルをmacro以下に配置する。

:::details
macroディレクトリ下のget_custom_database.sql/get_custom_schema.sqlというファイルで挙動を制御できます。(公式リンク)

<!-- get_custom_database.sql -->
{% macro generate_database_name(custom_database_name=none, node=none) -%}
    {%- if target.name == 'prod' -%}
        {{ target.database }}
    {%- else -%}
        {{target.database}}-sandbox
    {%- endif -%}
{%- endmacro %}

<!-- get_custom_schema.sql -->
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- if target.name == 'prod' -%}
        {%- if custom_schema_name is none -%}
            {{ target.schema }}
        {%- else -%}
            {{ custom_schema_name | trim }}
        {%- endif -%}
    {%- else -%}
        {%- if custom_schema_name is none -%}
            {{ target.schema }}
        {%- else -%}
            {{ target.schema }}_{{ custom_schema_name | trim }}
        {%- endif -%}
    {%- endif -%}
{%- endmacro %}

:::

各種変数名について解説

上記ファイルの変数について説明する

target.name, target.database, target.schema
custom_schema_name

`target.name`, `target.database`, `target.schema`

profiles.ymlで設定した値になる

default: # this needs to match the profile in your dbt_project.yml file
  target: prod # target.nameになる
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: <your_project> # target.database
      dataset: <your_dataset> # target.schema。dev環境で山田太郎とかならt_yamadaにする、
      threads: 4
      timeout_seconds: 300
      location: asia-northeast1
      priority: interactive

`custom_schema_name`

方法は2つ。(公式リンク)

① dbt_project.ymlのmodels以下で、指定したいディレクトリのschemeを設定する

# dbt_project.yml
# models in `models/marketing/ will be rendered to the "*_marketing" schema
models:
  my_project:
    marketing:
      +schema: marketing

② modelsファイルの中でconfig関数を使用する

-- some sql file which should be in mareketing schema.
{{ config(schema='marketing') }}
select ...

②は個別ファイルを参照する必要があるので、基本的には①で設定するのが望ましい。

その他

get_custome_database.sqlはprofiles.ymlが編集できるなら設定する必要ない気がしてきた。(project名をprofiles.yml内で帰れるため)