💬
BigQuery Data Transfer Service で GCS から BQ に Load する構成を Terraform で記述
今回の環境
- Google Cloud Storage(以下GCS)にJSONのログが存在
- gs://foobar-shase/yyyymmdd/foobar1.json.gz
- yyyymmdd は UTC 、-24h して前日を指定したい
- BigQuery Data Transfer Service を使って BigQuery に 日次でLoad する処理を Terraform で記述したい。
実際のtfファイル
data_transfer.tf
resource "google_service_account" "transfer_user" {
project = "private-shase"
account_id = "transfer-user"
}
resource "google_storage_bucket" "foobar_log" {
project = "private-shase"
name = "foobar-shase"
location = "US"
force_destroy = false
uniform_bucket_level_access = true
}
resource "google_project_iam_binding" "bigquery_admin" {
project = "private-shase"
role = "roles/bigquery.admin"
members = [
"serviceAccount:${google_service_account.transfer_user.email}",
]
}
resource "google_project_iam_binding" "storage_object_admin" {
project = "private-shase"
role = "roles/storage.objectAdmin"
members = [
"serviceAccount:${google_service_account.transfer_user.email}",
]
}
resource "google_project_iam_binding" "sa_token_creator" {
project = "private-shase"
role = "roles/iam.serviceAccountTokenCreator"
members = [
"serviceAccount:${google_service_account.transfer_user.email}",
]
}
resource "google_bigquery_data_transfer_config" "foobar_transfer_config" {
display_name = "foobar_transfer"
location = "US"
data_source_id = "google_cloud_storage"
destination_dataset_id = "foobar"
project = "private-shase"
schedule = "every day 01:00"
service_account_name = google_service_account.transfer_user.email
params = {
data_path_template = "gs://foobar-shase/{run_time-24h|\"%Y%m%d\"}/*/*.json.gz"
destination_table_name_template = "foobar_table$${run_time-24h|\"%Y%m%d\"}"
file_format = "JSON"
max_bad_records = 0
write_disposition = "MIRROR"
}
}
resource "google_bigquery_dataset" "foobar_dataset" {
dataset_id = "foobar_dataset"
friendly_name = "foobar"
description = "foobar"
location = "US"
}
ハマりポイント
google_bigquery_data_transfer_config の data_path_template と destination_table_name_template のエスケープ
params = {
data_path_template = "gs://foobar-shase/{run_time-24h|\"%Y%m%d\"}/*/*.json.gz"
destination_table_name_template = "foobar_table$${run_time-24h|\"%Y%m%d\"}"
file_format = "JSON"
max_bad_records = 0
write_disposition = "MIRROR"
}
data_path_template と destination_table_name_template にはそれぞれ run_time パラメータというマクロを指定できるのだが、terraform 化したときに、この escape でハマってしまった。
ダブルクオート(")はバックスラッシュで、パーティショニングテーブルのダラー($)は$$のようにダラーを重ねることが必要。
参考(ワイルドカードサポートについて)
- https://cloud.google.com/bigquery/docs/gcs-transfer-parameters?hl=ja
- https://cloud.google.com/bigquery/docs/cloud-storage-transfer-overview?hl=ja#wildcard-support
その他
service_account_name の指定方法だったり、必要なroleだったり、色々ハマってしまった...。
Discussion