Amazon AppFlowでBigQuery→S3にデータを転送する
Amazon Appflowを使ってBigQueryのデータをS3に転送してみました。
OAuthクライアントID発行
APIとサービス > 認証情報 > 認証情報を作成 > OAuthクライアントID
からOAuthクライアントIDを発行する。
↓の感じで作成しました。作成時にscopeを設定するのですが、BigQueryに関するscopeを指定しました。
転送データ
転送対象データとしてBigQueryに下記のsampleテーブルを作成しました。
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
Appflow
送信元をBigQueryに、送信先はS3に設定しました。
フィールドのマッピングは一旦全フィールドを選択してみました。
実行してみると、下記のjsonファイルがS3に配置されていました。ネストしているフィールドもしっかり転送されています。
{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"John","dob":"1968-01-22","last_name":"Doe","id":1}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"Jane","dob":"1980-10-16","last_name":"Doe","id":2}
データ加工
Appflowではデータ転送時に簡易な加工もできるようなので試してみます。
連結
fisrt_nameとlast_nameを連結してfull_nameというフィールドを作成してみます。
S3に転送されたデータが下記です。full_nameフィールドが作成されています。
{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"John","dob":"1968-01-22","last_name":"Doe","id":1,"full_name":"John Doe"}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"Jane","dob":"1980-10-16","last_name":"Doe","id":2,"full_name":"Jane Doe"}
マスク/切り捨て
次に値のマスクと切り捨てを試してみます。
マスク
first_nameの全文字をマスクします。
切り捨て
last_nameの2文字目以降を切り捨てます。
S3に転送されたデータが下記です。first_nameが*
でマスクされ、last_nameは2文字目以降が切り捨てられています。結合処理はマスク/切り捨ての処理よりも優先されるようで、full_nameはマスク/切り捨て前の値が使われています。
{"addresses":[{"numberOfYears":"1","state":"WA","city":"Seattle","zip":"11111","address":"123 First Avenue","status":"current"},{"numberOfYears":"5","state":"OR","city":"Portland","zip":"22222","address":"456 Main Street","status":"previous"}],"first_name":"****","dob":"1968-01-22","last_name":"D","id":1,"full_name":"John Doe"}
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"****","dob":"1980-10-16","last_name":"D","id":2,"full_name":"Jane Doe"}
フィルター
最後にフィルターを試してみます。
特定のフィールドに対して条件を設定し、条件に合致するレコードのみを転送してくれるようです。
1970/1/1よりも後のdobを持つレコードのみを転送対象に設定します。
S3に転送されたデータが下記です。条件に合致する1レコードのみが転送されました。
{"addresses":[{"numberOfYears":"2","state":"NY","city":"New York","zip":"33333","address":"789 Any Avenue","status":"current"},{"numberOfYears":"3","state":"NJ","city":"Hoboken","zip":"44444","address":"321 Main Street","status":"previous"}],"first_name":"****","dob":"1980-10-16","last_name":"D","id":2,"full_name":"Jane Doe"}
まとめ
Amazon Appflowを使ってBigQuery→S3のデータ転送を試してみました。何かの参考になれば幸いです。
Discussion