🕌

Snowflake JSONデータのロードについて その③

2022/10/31に公開約6,000字

前回記事からの続きです。
配列内の要素の取り出しを見ていきます。

JSONデータの確認

※忘れているかもしれないので、念のため確認しておきましょう。

{"id":1,"first_name":"Portia","last_name":"Gioani","gender":"Male","city":"Bakersfield","job":{"title":"Financial Analyst","salary":32000},"spoken_languages":[{"language":"Kazakh","level":"Advanced"},{"language":"Lao","level":"Basic"}],"prev_company":[]},
{"id":2,"first_name":"Dag","last_name":"Croney","gender":"Male","city":"Louny","job":{"title":"Clinical Specialist","salary":43000},"spoken_languages":[{"language":"Assamese","level":"Basic"},{"language":"Papiamento","level":"Expert"},{"language":"Telugu","level":"Basic"}],"prev_company":["MacGyver, Kessler and Corwin","Gerlach, Russel and Moen"]},
{"id":3,"first_name":"Heath","last_name":"Lackmann","gender":"Female","city":"Wates","job":{"title":"Research Assistant I","salary":40800},"spoken_languages":[{"language":"Swati","level":"Expert"}],"prev_company":["Schaden LLC","Reynolds LLC"]},
{"id":4,"first_name":"Dita","last_name":"Deering","gender":"Female","city":"Umeテ・","job":{"title":"Assistant Media Planner","salary":14400},"spoken_languages":[{"language":"Chinese","level":"Advanced"},{"language":"Mongolian","level":"Basic"}],"prev_company":[]},
{"id":5,"first_name":"Nikki","last_name":"McCosh","gender":"Female","city":"Shangxian","job":{"title":"Administrative Officer","salary":34600},"spoken_languages":[{"language":"Filipino","level":"Basic"},{"language":"Kazakh","level":"Basic"}],"prev_company":[]},
{"id":6,"first_name":"Austina","last_name":"Laux","gender":"Female","city":"Saurimo","job":{"title":"Compensation Analyst","salary":25800},"spoken_languages":[{"language":"Northern Sotho","level":"Basic"},{"language":"Tswana","level":"Expert"},{"language":"Georgian","level":"Advanced"}],"prev_company":["Rath Inc"]},
{"id":7,"first_name":"Tessa","last_name":"Nairns","gender":"Female","city":"Kangar","job":{"title":"Data Coordiator","salary":30600},"spoken_languages":[{"language":"Korean","level":"Basic"}],"prev_company":[]},
{"id":8,"first_name":"Remington","last_name":"Lilleman","gender":"Female","city":"Muurla","job":{"title":"Pharmacist","salary":22700},"spoken_languages":[{"language":"Kazakh","level":"Expert"}],"prev_company":["Kuhlman LLC","Lockman, Kunze and Bartoletti"]},
{"id":9,"first_name":"Genovera","last_name":"Semered","gender":"Male","city":"Espinosa","job":{"title":"Compensation Analyst","salary":33200},"spoken_languages":[{"language":"Haitian Creole","level":"Advanced"},{"language":"Quechua","level":"Basic"},{"language":"Thai","level":"Expert"}],"prev_company":[]},
{"id":10,"first_name":"Louella","last_name":"Tranckle","gender":"Male","city":"Marsada","job":{"title":"Chief Design Engineer","salary":20800},"spoken_languages":[{"language":"Oriya","level":"Basic"},{"language":"Luxembourgish","level":"Basic"},{"language":"Dzongkha","level":"Expert"}],"prev_company":[]},

配列の確認

'prev_company'のバリューは配列になっているため普通にやっても取り出せません。

SELECT RAW_FILE:prev_company as prev_company FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;
PREV_COMPANY
[]
[ "MacGyver, Kessler and Corwin", "Gerlach, Russel and Moen"]
[ "Schaden LLC", "Reynolds LLC"]

配列データの操作方法

カラム名:キー名[インデックス番号]で取得する

※配列の一番目の要素番号は0

SELECT RAW_FILE:prev_company[0]::STRING as prev_company
FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;
PREV_COMPANY
NULL
MacGyver, Kessler and Corwin
Reynolds LLC

TIPS

今回使用したJSONファイルには下記のようなデータが含まれている。
これは話すことができる言語とそのレベルが記載してあります。
複数言語を扱える人は要素が複数あります。

SELECT RAW_FILE:spoken_languages as spoken_languages
FROM OUR_FIRST_DB.PUBLIC.JSON_RAW;


これらの値を良い感じに取得したいと思います。

悪い例

先に悪い例をお見せする。
SQLコマンドを見てみるとspoken_languagesの要素がMAX3なので、3つのSELECT句を用意し、それぞれをUNIONでつなげている。
この場合、spoken_languagesが1つないし2つの人の場合NULL値で埋められている。
これだとスマートではない。

SELECT 
    RAW_FILE:id::int as id,
    RAW_FILE:first_name::STRING as First_name,
    RAW_FILE:spoken_languages[0].language::STRING as First_language,
    RAW_FILE:spoken_languages[0].level::STRING as Level_spoken
FROM OUR_FIRST_DB.PUBLIC.JSON_RAW
UNION ALL 
SELECT 
    RAW_FILE:id::int as id,
    RAW_FILE:first_name::STRING as First_name,
    RAW_FILE:spoken_languages[1].language::STRING as First_language,
    RAW_FILE:spoken_languages[1].level::STRING as Level_spoken
FROM OUR_FIRST_DB.PUBLIC.JSON_RAW
UNION ALL 
SELECT 
    RAW_FILE:id::int as id,
    RAW_FILE:first_name::STRING as First_name,
    RAW_FILE:spoken_languages[2].language::STRING as First_language,
    RAW_FILE:spoken_languages[2].level::STRING as Level_spoken
FROM OUR_FIRST_DB.PUBLIC.JSON_RAW
ORDER BY ID

良い例

FLATTEN関数を利用することで、複雑な階層を持つデータを平坦化してくれます。
それではSQLコマンドを見てみましょう。
一番のポイントはFROM句の後のtable(flatten(RAW_FILE:spoken_languages)) fです。
flatten({平坦化したい親属性}) fとすることで平坦化した情報をfと言うエイリアスに渡しています。
そしてそのエイリアスを使って子属性を指定することで要素内の値を取得できます。

select
    RAW_FILE:first_name::STRING as First_name,
    f.value:language::STRING as First_language,
   f.value:level::STRING as Level_spoken
from OUR_FIRST_DB.PUBLIC.JSON_RAW, table(flatten(RAW_FILE:spoken_languages)) f;

解析したJSONデータをテーブルとして作成

これまで解析した手法を使ってテーブルを作成します。

CREATE OR REPLACE TABLE Languages AS
select
      RAW_FILE:first_name::STRING as First_name,
    f.value:language::STRING as First_language,
   f.value:level::STRING as Level_spoken
from OUR_FIRST_DB.PUBLIC.JSON_RAW, table(flatten(RAW_FILE:spoken_languages)) f;

SELECT * FROM Languages;

もしくはINSERT句でも作成できます。
一度テーブルの中身を削除します。

truncate table languages;

INSERT句で作成(挿入)します。

INSERT INTO Languages
select
      RAW_FILE:first_name::STRING as First_name,
    f.value:language::STRING as First_language,
   f.value:level::STRING as Level_spoken
from OUR_FIRST_DB.PUBLIC.JSON_RAW, table(flatten(RAW_FILE:spoken_languages)) f;

Discussion

ログインするとコメントできます