🐋

Amazon AppFlowでBigQuery→S3にデータを転送する

2024/01/25に公開

Amazon Appflowを使ってBigQueryのデータをS3に転送してみました。

OAuthクライアントID発行

APIとサービス > 認証情報 > 認証情報を作成 > OAuthクライアントID からOAuthクライアントIDを発行する。

↓の感じで作成しました。作成時にscopeを設定するのですが、BigQueryに関するscopeを指定しました。

転送データ

転送対象データとしてBigQueryに下記のsampleテーブルを作成しました。

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

Appflow

送信元をBigQueryに、送信先はS3に設定しました。

フィールドのマッピングは一旦全フィールドを選択してみました。

実行してみると、下記のjsonファイルがS3に配置されていました。ネストしているフィールドもしっかり転送されています。

{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"John","dob":"1968-01-22","last_name":"Doe","id":1}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"Jane","dob":"1980-10-16","last_name":"Doe","id":2}

データ加工

Appflowではデータ転送時に簡易な加工もできるようなので試してみます。

連結

fisrt_nameとlast_nameを連結してfull_nameというフィールドを作成してみます。

S3に転送されたデータが下記です。full_nameフィールドが作成されています。

{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"John","dob":"1968-01-22","last_name":"Doe","id":1,"full_name":"John Doe"}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"Jane","dob":"1980-10-16","last_name":"Doe","id":2,"full_name":"Jane Doe"}

マスク/切り捨て

次に値のマスクと切り捨てを試してみます。

マスク
first_nameの全文字をマスクします。

切り捨て
last_nameの2文字目以降を切り捨てます。

S3に転送されたデータが下記です。first_nameが*でマスクされ、last_nameは2文字目以降が切り捨てられています。結合処理はマスク/切り捨ての処理よりも優先されるようで、full_nameはマスク/切り捨て前の値が使われています。

{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"****","dob":"1968-01-22","last_name":"D","id":1,"full_name":"John Doe"}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"****","dob":"1980-10-16","last_name":"D","id":2,"full_name":"Jane Doe"}

フィルター

最後にフィルターを試してみます。
特定のフィールドに対して条件を設定し、条件に合致するレコードのみを転送してくれるようです。

1970/1/1よりも後のdobを持つレコードのみを転送対象に設定します。

S3に転送されたデータが下記です。条件に合致する1レコードのみが転送されました。

{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"****","dob":"1980-10-16","last_name":"D","id":2,"full_name":"Jane Doe"}

まとめ

Amazon Appflowを使ってBigQuery→S3のデータ転送を試してみました。何かの参考になれば幸いです。

Discussion