🐙
DuckDB x dbt-duckdb使ってみるよ
前回記事のつづきです。
dbt-duckdbがあるそうなので、データの取り込みをやってみます。
前の記事でも書きましたが、やっぱり無料で使えるOLAPのDWHがあると個人開発が捗るし、勉強にもなるのですごく助かります。
dbt-duckdbのインストール
root@28b9c609ac51:/duckdb_work/# sudo apt install python3
root@28b9c609ac51:/duckdb_work/# sudo apt install python3-pip
root@28b9c609ac51:/duckdb_work/# pip3 install dbt-duckdb
dbtプロジェクトの作成
今回はYouTubeのオープンデータを取り込むので、プロジェクト名をそれっぽいものにします。
root@28b9c609ac51:/duckdb_work/# dbt init youtube_vocabulary
YouTubeデータセット
DuckdbでYouTube用のデータベースを作成
root@28b9c609ac51:/duckdb_work/# ./duckdb youtube_vocabulary.duckdb
profiles.ymlの作成
profiles.yml
youtube_vocabulary:
outputs:
dev:
type: duckdb
path: /duckdb_work/youtube_vocabulary.duckdb
target: dev
dbt debugの実行
debugの実行
dbt debugの実行
06:19:34 Running with dbt=1.5.2
06:19:34 dbt version: 1.5.2
06:19:34 python version: 3.8.10
06:19:34 python path: /usr/bin/python3
06:19:34 os info: Linux-5.15.49-linuxkit-x86_64-with-glibc2.29
06:19:34 Using profiles.yml file at /duckdb_work/youtube_vocabulary/profiles.yml
06:19:34 Using dbt_project.yml file at /duckdb_work/youtube_vocabulary/dbt_project.yml
06:19:34 Configuration:
06:19:34 profiles.yml file [OK found and valid]
06:19:34 dbt_project.yml file [OK found and valid]
06:19:34 Required dependencies:
06:19:34 - git [OK found]
06:19:34 Connection:
06:19:34 database: youtube_vocabulary
06:19:34 schema: main
06:19:34 path: /duckdb_work/youtube_vocabulary.duckdb
06:19:34 Registered adapter: duckdb=1.5.2
06:19:34 Connection test: [OK connection ok]
06:19:34 All checks passed!
YouTubeデータをSeedで取り込む
データの取り込み
root@28b9c609ac51:/duckdb_work/youtube_vocabulary/seeds# ls
vocabulary.csv
root@28b9c609ac51:/duckdb_work/youtube_vocabulary# dbt seed
06:20:37 Running with dbt=1.5.2
06:20:37 Registered adapter: duckdb=1.5.2
06:20:37 Unable to do partial parsing because saved manifest not found. Starting full parse.
06:20:39 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 316 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
06:20:39
06:20:39 Concurrency: 1 threads (target='dev')
06:20:39
06:20:39 1 of 1 START seed file main.vocabulary ......................................... [RUN]
06:20:49 1 of 1 OK loaded seed file main.vocabulary ..................................... [INSERT 3862 in 10.54s]
06:20:49
06:20:49 Finished running 1 seed in 0 hours 0 minutes and 10.69 seconds (10.69s).
06:20:49
06:20:49 Completed successfully
06:20:49
06:20:49 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
DuckDBのデータベースで取り込んだデータを確認する
root@28b9c609ac51:/duckdb_work# ./duckdb youtube_vocabulary.duckdb
v0.8.1 6536a77232
Enter ".help" for usage hints.
D SELECT* FROM vocabulary LIMIT 10;
┌───────┬─────────────────┬──────────────────┬─────────────────┬──────────────────────┬──────────────────────┬───────────┬───────────┬───────────────────────────────────┐
│ Index │ TrainVideoCount │ KnowledgeGraphId │ Name │ WikiUrl │ Vertical1 │ Vertical2 │ Vertical3 │ WikiDescription │
│ int32 │ int32 │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────┼─────────────────┼──────────────────┼─────────────────┼──────────────────────┼──────────────────────┼───────────┼───────────┼───────────────────────────────────┤
│ 0 │ 788288 │ /m/03bt1gh │ Game │ https://en.wikiped… │ Games │ │ │ A game is structured form of pl… │
│ 1 │ 539945 │ /m/01mw1 │ Video game │ https://en.wikiped… │ Games │ │ │ A video game is an electronic g… │
│ 2 │ 415890 │ /m/07yv9 │ Vehicle │ https://en.wikiped… │ Autos & Vehicles │ │ │ A vehicle is a mobile machine t… │
│ 3 │ 378135 │ /m/01jddz │ Concert │ https://en.wikiped… │ Arts & Entertainment │ │ │ A concert is a live music perfo… │
│ 4 │ 286532 │ /m/09jwl │ Musician │ https://en.wikiped… │ Arts & Entertainment │ │ │ A musician is a person who play… │
│ 5 │ 236948 │ /m/0215n │ Cartoon │ https://en.wikiped… │ Arts & Entertainment │ │ │ A cartoon is a type of two-dime… │
│ 6 │ 203343 │ /m/01350r │ Performance art │ https://en.wikiped… │ Arts & Entertainment │ │ │ Performance art is a performanc… │
│ 7 │ 200813 │ /m/0k4j │ Car │ https://en.wikiped… │ Autos & Vehicles │ │ │ A car is a wheeled, self-powere… │
│ 8 │ 181579 │ /m/026bk │ Dance │ https://en.wikiped… │ Arts & Entertainment │ │ │ Dance is a performance art form… │
│ 9 │ 156226 │ /m/0342h │ Guitar │ https://en.wikiped… │ Arts & Entertainment │ │ │ The guitar is a musical instrum… │
├───────┴─────────────────┴──────────────────┴─────────────────┴──────────────────────┴──────────────────────┴───────────┴───────────┴───────────────────────────────────┤
│ 10 rows 9 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
D
Discussion