🐕

Snowflake JSONデータのロードについて その①

2022/10/29に公開約4,000字

SnowflakeでJSON形式のファイルをロードする手法を見ていきたいと思います。

JSONデータファイルの用意

今回利用するJSONデータです。このデータをSnowflakeにロードしていきます。
※一部のみ表示ひています。

{"id":1,"first_name":"Portia","last_name":"Gioani","gender":"Male","city":"Bakersfield","job":{"title":"Financial Analyst","salary":32000},"spoken_languages":[{"language":"Kazakh","level":"Advanced"},{"language":"Lao","level":"Basic"}],"prev_company":[]},
{"id":2,"first_name":"Dag","last_name":"Croney","gender":"Male","city":"Louny","job":{"title":"Clinical Specialist","salary":43000},"spoken_languages":[{"language":"Assamese","level":"Basic"},{"language":"Papiamento","level":"Expert"},{"language":"Telugu","level":"Basic"}],"prev_company":["MacGyver, Kessler and Corwin","Gerlach, Russel and Moen"]},
{"id":3,"first_name":"Heath","last_name":"Lackmann","gender":"Female","city":"Wates","job":{"title":"Research Assistant I","salary":40800},"spoken_languages":[{"language":"Swati","level":"Expert"}],"prev_company":["Schaden LLC","Reynolds LLC"]},
{"id":4,"first_name":"Dita","last_name":"Deering","gender":"Female","city":"Umeテ・","job":{"title":"Assistant Media Planner","salary":14400},"spoken_languages":[{"language":"Chinese","level":"Advanced"},{"language":"Mongolian","level":"Basic"}],"prev_company":[]},
{"id":5,"first_name":"Nikki","last_name":"McCosh","gender":"Female","city":"Shangxian","job":{"title":"Administrative Officer","salary":34600},"spoken_languages":[{"language":"Filipino","level":"Basic"},{"language":"Kazakh","level":"Basic"}],"prev_company":[]},
{"id":6,"first_name":"Austina","last_name":"Laux","gender":"Female","city":"Saurimo","job":{"title":"Compensation Analyst","salary":25800},"spoken_languages":[{"language":"Northern Sotho","level":"Basic"},{"language":"Tswana","level":"Expert"},{"language":"Georgian","level":"Advanced"}],"prev_company":["Rath Inc"]},
{"id":7,"first_name":"Tessa","last_name":"Nairns","gender":"Female","city":"Kangar","job":{"title":"Data Coordiator","salary":30600},"spoken_languages":[{"language":"Korean","level":"Basic"}],"prev_company":[]},
{"id":8,"first_name":"Remington","last_name":"Lilleman","gender":"Female","city":"Muurla","job":{"title":"Pharmacist","salary":22700},"spoken_languages":[{"language":"Kazakh","level":"Expert"}],"prev_company":["Kuhlman LLC","Lockman, Kunze and Bartoletti"]},
{"id":9,"first_name":"Genovera","last_name":"Semered","gender":"Male","city":"Espinosa","job":{"title":"Compensation Analyst","salary":33200},"spoken_languages":[{"language":"Haitian Creole","level":"Advanced"},{"language":"Quechua","level":"Basic"},{"language":"Thai","level":"Expert"}],"prev_company":[]},
{"id":10,"first_name":"Louella","last_name":"Tranckle","gender":"Male","city":"Marsada","job":{"title":"Chief Design Engineer","salary":20800},"spoken_languages":[{"language":"Oriya","level":"Basic"},{"language":"Luxembourgish","level":"Basic"},{"language":"Dzongkha","level":"Expert"}],"prev_company":[]},

外部ステージ作成

CREATE OR REPLACE stage MANAGE_DB.EXTERNAL_STAGES.JSONSTAGE
     url='s3://bucketsnowflake-jsondemo';

JSONフォーマット作成

CREATE OR REPLACE file format MANAGE_DB.FILE_FORMATS.JSONFORMAT
    TYPE = JSON;

JSONの生データを受け取る専用のテーブルを作成

CREATE OR REPLACE table OUR_FIRST_DB.PUBLIC.JSON_RAW (
    raw_file variant);

このテーブルには下記JSONファイルの行を入れるためにカラムは'raw_file'で型がVARIANTを指定しています。

{"id":10,"first_name":"Louella","last_name":"Tranckle","gender":"Male","city":"Marsada","job":{"title":"Chief Design Engineer","salary":20800},"spoken_languages":[{"language":"Oriya","level":"Basic"},{"language":"Luxembourgish","level":"Basic"},{"language":"Dzongkha","level":"Expert"}],"prev_company":[]},

JSON_RAW専用テーブルにロードする

COPY INTO OUR_FIRST_DB.PUBLIC.JSON_RAW
    FROM @MANAGE_DB.EXTERNAL_STAGES.JSONSTAGE
    file_format= MANAGE_DB.FILE_FORMATS.JSONFORMAT
    files = ('HR_data.json');
SELECT * FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;

RAW_FILEカラムに1行づつロードされています。

次回ロードした1行の中のキー:バリューという構造から任意の値を取得したいと思います。

Discussion

ログインするとコメントできます