Snowflake と S3 連携を試す with Terraform
参考資料
前準備
トライアルアカウントを作った。
S3 バケットと Policy, Role の作成
ドキュメントのステップ2までをやったつもり。
terraform {
required_version = "~> 1.8.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "ap-northeast-1"
}
resource "aws_iam_policy" "snowflake_s3_access" {
name = "snowflake-s3-access"
description = "Policy for Snowflake to access S3 data"
policy = templatefile(
"snowflake-s3-access-policy.json",
{
bucket_name = aws_s3_bucket.default.bucket,
}
)
}
resource "aws_iam_role" "snowflake_s3_access" {
name = "snowflake-s3-access"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
Action = "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role_policy_attachment" "s3_access_attachment" {
role = aws_iam_role.snowflake_s3_access.name
policy_arn = aws_iam_policy.snowflake_s3_access.arn
}
resource "aws_s3_bucket" "default" {
bucket = "9sako6-demo-snowflake-data-lake"
lifecycle {
prevent_destroy = true
}
}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::${bucket_name}/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::${bucket_name}"
}
]
}
ストレージ統合を作る。
ストレージ統合は、S3クラウドストレージ用に生成されたIDおよびアクセス管理(IAM)ユーザーと、許可またはブロックされたストレージロケーション(バケットなど)のオプションセットを保存するSnowflakeオブジェクトです。
ここで snowsql で接続できない現象。ブラウザの UI からコピーしてきた XXXXXXX.XXXXXXX というアカウント表記だとダメで、XXXXXXX-XXXXXXX にするといけた。これは何。
snowsql -u <user> -a XXXXXXX-XXXXXXX
IAM ロールの ARN が必要になったので tf に追記して取得。
output "iam_role_arn" {
value = aws_iam_role.snowflake_s3_access.arn
description = "The ARN of the IAM role"
}
ドキュメントに習って CREATE STORAGE INTEGRATION
を実行した。
+--------------------------------------------------+
| status |
|--------------------------------------------------|
| Integration S3_INTEGRATION successfully created. |
+--------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.386s
次の手順にある DESC INTEGRATION
も問題なく実行できた。
表示された情報のうち、STORAGE_AWS_IAM_USER_ARN
, STORAGE_AWS_EXTERNAL_ID
を使って次の手順を進める。
先ほど IAM ロールの assume_role_policy を下記に変更して apply した。
resource "aws_iam_role" "snowflake_s3_access" {
name = "snowflake-s3-access"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "<STORAGE_AWS_IAM_USER_ARN>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID>"
}
}
}
]
})
}
次は外部ステージを作る。DATABASE, SCHEMA が必要だったので作った。
CREATE DATABASE IF NOT EXISTS demo_db;
CREATE SCHEMA IF NOT EXISTS demo_schema;
ステージは、 my_csv_format という名前のファイル形式オブジェクトも参照します。
急にわからなくなった。ファイル形式オブジェクトとは。
これか。
作りました。
CREATE OR REPLACE FILE FORMAT mycsvformat
TYPE = 'CSV'
SKIP_HEADER = 1;
外部ステージを作成する。
USE SCHEMA demo_db.demo_schema;
CREATE STAGE my_s3_stage
STORAGE_INTEGRATION = s3_integration
URL = 's3://9sako6-demo-snowflake-data-lake/'
FILE_FORMAT = mycsvformat;
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.133s
+----------------------------------------------+
| status |
|----------------------------------------------|
| Stage area MY_S3_STAGE successfully created. |
+----------------------------------------------+
1 Row(s) produced. Time Elapsed: 1.033s
下記のサンプル CSV を S3 にアップロードしてみる。
Snowflake 上に反映されている!
LIST
コマンドでも確認できた。
9sako6demo#COMPUTE_WH@DEMO_DB.DEMO_SCHEMA>LIST @my_s3_stage;
+----------------------------------------------------------+------+----------------------------------+-------------------------------+
| name | size | md5 | last_modified |
|----------------------------------------------------------+------+----------------------------------+-------------------------------|
| s3://9sako6-demo-snowflake-data-lake/raw/employees01.csv | 370 | ee3c86ae89eabb22667d6fd7684fec17 | Sun, 26 May 2024 08:17:55 GMT |
| s3://9sako6-demo-snowflake-data-lake/raw/employees02.csv | 364 | 20447871857babe7b4a12363f5f754da | Sun, 26 May 2024 08:17:55 GMT |
| s3://9sako6-demo-snowflake-data-lake/raw/employees03.csv | 407 | 65d89581bbd5b07cf78bf847205d7cbd | Sun, 26 May 2024 08:17:55 GMT |
| s3://9sako6-demo-snowflake-data-lake/raw/employees04.csv | 375 | 53fe4c128aa9567e7deb28abce5f8d97 | Sun, 26 May 2024 08:17:56 GMT |
| s3://9sako6-demo-snowflake-data-lake/raw/employees05.csv | 404 | 87d47add6452f8ab2949f6b29d6b285a | Sun, 26 May 2024 08:17:56 GMT |
+----------------------------------------------------------+------+----------------------------------+-------------------------------+
5 Row(s) produced. Time Elapsed: 0.170s
この CSV をテーブルに取り込んでみる。まずはテーブルの作成。
CREATE OR REPLACE TABLE emp_basic (
first_name STRING ,
last_name STRING ,
email STRING ,
streetaddress STRING ,
city STRING ,
start_date DATE
);
ウェアハウスも必要そうなので作成。
CREATE OR REPLACE WAREHOUSE wh_demo_xs WITH
WAREHOUSE_SIZE='X-SMALL'
AUTO_SUSPEND = 180
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED=TRUE;
9sako6demo#WH_DEMO_XS@DEMO_DB.DEMO_SCHEMA>SELECT CURRENT_WAREHOUSE();
+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| WH_DEMO_XS |
+---------------------+
1 Row(s) produced. Time Elapsed: 0.337s
ステージングされたデータを取り込むには COPY INTO
を使う模様。
9sako6demo#WH_DEMO_XS@DEMO_DB.DEMO_SCHEMA>COPY INTO emp_basic
FROM @my_s3_stage
FILE_FORMAT = (type = csv field_optionally_enclosed_by='"')
PATTERN = '.*employees0[1-5].csv'
ON_ERROR = 'skip_file';
+----------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|----------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| s3://9sako6-demo-snowflake-data-lake/raw/employees05.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
| s3://9sako6-demo-snowflake-data-lake/raw/employees01.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
| s3://9sako6-demo-snowflake-data-lake/raw/employees04.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
| s3://9sako6-demo-snowflake-data-lake/raw/employees02.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
| s3://9sako6-demo-snowflake-data-lake/raw/employees03.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
+----------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
5 Row(s) produced. Time Elapsed: 1.377s
取り込めた!
9sako6demo#WH_DEMO_XS@DEMO_DB.DEMO_SCHEMA>SELECT * FROM emp_basic;
+------------+--------------+---------------------------+-----------------------------+--------------------+------------+
| FIRST_NAME | LAST_NAME | EMAIL | STREETADDRESS | CITY | START_DATE |
|------------+--------------+---------------------------+-----------------------------+--------------------+------------|
| Arlene | Davidovits | adavidovitsk@sf_tuts.com | 7571 New Castle Circle | Meniko | 2017-05-03 |
| Violette | Shermore | vshermorel@sf_tuts.com | 899 Merchant Center | Troitsk | 2017-01-19 |
| Ron | Mattys | rmattysm@sf_tuts.com | 423 Lien Pass | Bayaguana | 2017-11-15 |
| Shurlocke | Oluwatoyin | soluwatoyinn@sf_tuts.com | 40637 Portage Avenue | Semënovskoye | 2017-09-12 |
| Granger | Bassford | gbassfordo@sf_tuts.co.uk | 6 American Ash Circle | Kardítsa | 2016-12-30 |
| Lem | Boissier | lboissier@sf_tuts.com | 3002 Ruskin Trail | Shikārpur | 2017-08-25 |
| Iain | Hanks | ihanks1@sf_tuts.com | 2 Pankratz Hill | Monte-Carlo | 2017-12-10 |
| Avo | Laudham | alaudham2@sf_tuts.com | 6948 Debs Park | Prażmów | 2017-10-18 |
| Emili | Cornner | ecornner3@sf_tuts.com | 177 Magdeline Avenue | Norrköping | 2017-08-13 |
| Harrietta | Goolding | hgoolding4@sf_tuts.com | 450 Heath Trail | Osielsko | 2017-11-27 |
| Wallis | Sizey | wsizeyf@sf_tuts.com | 36761 American Lane | Taibao | 2016-12-30 |
| Di | McGowran | dmcgowrang@sf_tuts.com | 1856 Maple Lane | Banjar Bengkelgede | 2017-04-22 |
| Carson | Bedder | cbedderh@sf_tuts.co.au | 71 Clyde Gallagher Place | Leninskoye | 2017-03-29 |
| Dana | Avory | davoryi@sf_tuts.com | 2 Holy Cross Pass | Wenlin | 2017-05-11 |
| Ronny | Talmadge | rtalmadgej@sf_tuts.co.uk | 588 Chinook Street | Yawata | 2017-06-02 |
| Nyssa | Dorgan | ndorgan5@sf_tuts.com | 7 Tomscot Way | Pampas Chico | 2017-04-13 |
| Catherin | Devereu | cdevereu6@sf_tuts.co.au | 535 Basil Terrace | Magapit | 2016-12-17 |
| Grazia | Glaserman | gglaserman7@sf_tuts.com | 162 Debra Lane | Shiquanhe | 2017-06-06 |
| Ivett | Casemore | icasemore8@sf_tuts.com | 84 Holmberg Pass | Campina Grande | 2017-03-29 |
| Cesar | Hovie | chovie9@sf_tuts.com | 5 7th Pass | Miami | 2016-12-21 |
| Althea | Featherstone | afeatherstona@sf_tuts.com | 8172 Browning Street, Apt B | Calatrava | 2017-07-12 |
| Hollis | Anneslie | hanneslieb@sf_tuts.com | 3248 Roth Park | Aleysk | 2017-11-16 |
| Betti | Cicco | bciccoc@sf_tuts.com | 121 Victoria Junction | Sinegor'ye | 2017-06-22 |
| Brendon | Durnall | bdurnalld@sf_tuts.com | 26814 Weeping Birch Place | Sabadell | 2017-11-14 |
| Kylila | MacConnal | kmacconnale@sf_tuts.com | 04 Valley Edge Court | Qingshu | 2017-06-22 |
+------------+--------------+---------------------------+-----------------------------+--------------------+------------+
25 Row(s) produced. Time Elapsed: 0.157s