🐙

DuckDB x dbt-duckdb使ってみるよ

2023/07/23に公開

前回記事のつづきです。
dbt-duckdbがあるそうなので、データの取り込みをやってみます。
前の記事でも書きましたが、やっぱり無料で使えるOLAPのDWHがあると個人開発が捗るし、勉強にもなるのですごく助かります。

dbt-duckdbのインストール

root@28b9c609ac51:/duckdb_work/# sudo apt install python3
root@28b9c609ac51:/duckdb_work/# sudo apt install python3-pip
root@28b9c609ac51:/duckdb_work/# pip3 install dbt-duckdb

dbtプロジェクトの作成

今回はYouTubeのオープンデータを取り込むので、プロジェクト名をそれっぽいものにします。

root@28b9c609ac51:/duckdb_work/# dbt init youtube_vocabulary

YouTubeデータセット

https://research.google.com/youtube8m/index.html

DuckdbでYouTube用のデータベースを作成

root@28b9c609ac51:/duckdb_work/# ./duckdb youtube_vocabulary.duckdb

profiles.ymlの作成

profiles.yml
youtube_vocabulary:
  outputs:
   dev:
     type: duckdb
     path: /duckdb_work/youtube_vocabulary.duckdb
  target: dev

dbt debugの実行

debugの実行
dbt debugの実行
06:19:34  Running with dbt=1.5.2
06:19:34  dbt version: 1.5.2
06:19:34  python version: 3.8.10
06:19:34  python path: /usr/bin/python3
06:19:34  os info: Linux-5.15.49-linuxkit-x86_64-with-glibc2.29
06:19:34  Using profiles.yml file at /duckdb_work/youtube_vocabulary/profiles.yml
06:19:34  Using dbt_project.yml file at /duckdb_work/youtube_vocabulary/dbt_project.yml
06:19:34  Configuration:
06:19:34    profiles.yml file [OK found and valid]
06:19:34    dbt_project.yml file [OK found and valid]
06:19:34  Required dependencies:
06:19:34   - git [OK found]

06:19:34  Connection:
06:19:34    database: youtube_vocabulary
06:19:34    schema: main
06:19:34    path: /duckdb_work/youtube_vocabulary.duckdb
06:19:34  Registered adapter: duckdb=1.5.2
06:19:34    Connection test: [OK connection ok]

06:19:34  All checks passed!

YouTubeデータをSeedで取り込む

データの取り込み
root@28b9c609ac51:/duckdb_work/youtube_vocabulary/seeds# ls
vocabulary.csv
root@28b9c609ac51:/duckdb_work/youtube_vocabulary# dbt seed
06:20:37  Running with dbt=1.5.2
06:20:37  Registered adapter: duckdb=1.5.2
06:20:37  Unable to do partial parsing because saved manifest not found. Starting full parse.
06:20:39  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 316 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
06:20:39
06:20:39  Concurrency: 1 threads (target='dev')
06:20:39
06:20:39  1 of 1 START seed file main.vocabulary ......................................... [RUN]
06:20:49  1 of 1 OK loaded seed file main.vocabulary ..................................... [INSERT 3862 in 10.54s]
06:20:49
06:20:49  Finished running 1 seed in 0 hours 0 minutes and 10.69 seconds (10.69s).
06:20:49
06:20:49  Completed successfully
06:20:49
06:20:49  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

DuckDBのデータベースで取り込んだデータを確認する

root@28b9c609ac51:/duckdb_work# ./duckdb youtube_vocabulary.duckdb
v0.8.1 6536a77232
Enter ".help" for usage hints.
D SELECT* FROM vocabulary LIMIT  10;
┌───────┬─────────────────┬──────────────────┬─────────────────┬──────────────────────┬──────────────────────┬───────────┬───────────┬───────────────────────────────────┐
│ Index │ TrainVideoCount │ KnowledgeGraphId │      Name       │       WikiUrl        │      Vertical1       │ Vertical2 │ Vertical3 │          WikiDescription          │
│ int32 │      int32      │     varchar      │     varchar     │       varchar        │       varchar        │  varchar  │  varchar  │              varchar              │
├───────┼─────────────────┼──────────────────┼─────────────────┼──────────────────────┼──────────────────────┼───────────┼───────────┼───────────────────────────────────┤
│     0 │          788288 │ /m/03bt1gh       │ Game            │ https://en.wikiped…  │ Games                │           │           │ A game is structured form of pl…  │
│     1 │          539945 │ /m/01mw1         │ Video game      │ https://en.wikiped…  │ Games                │           │           │ A video game is an electronic g…  │
│     2 │          415890 │ /m/07yv9         │ Vehicle         │ https://en.wikiped…  │ Autos & Vehicles     │           │           │ A vehicle is a mobile machine t…  │
│     3 │          378135 │ /m/01jddz        │ Concert         │ https://en.wikiped…  │ Arts & Entertainment │           │           │ A concert is a live music perfo…  │
│     4 │          286532 │ /m/09jwl         │ Musician        │ https://en.wikiped…  │ Arts & Entertainment │           │           │ A musician is a person who play…  │
│     5 │          236948 │ /m/0215n         │ Cartoon         │ https://en.wikiped…  │ Arts & Entertainment │           │           │ A cartoon is a type of two-dime…  │
│     6 │          203343 │ /m/01350r        │ Performance art │ https://en.wikiped…  │ Arts & Entertainment │           │           │ Performance art is a performanc…  │
│     7 │          200813 │ /m/0k4j          │ Car             │ https://en.wikiped…  │ Autos & Vehicles     │           │           │ A car is a wheeled, self-powere…  │
│     8 │          181579 │ /m/026bk         │ Dance           │ https://en.wikiped…  │ Arts & Entertainment │           │           │ Dance is a performance art form…  │
│     9 │          156226 │ /m/0342h         │ Guitar          │ https://en.wikiped…  │ Arts & Entertainment │           │           │ The guitar is a musical instrum…  │
├───────┴─────────────────┴──────────────────┴─────────────────┴──────────────────────┴──────────────────────┴───────────┴───────────┴───────────────────────────────────┤
│ 10 rows                                                                                                                                                      9 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
D

Discussion