Open13

HydraやCitusを試す

こばこば

Google CloudにVM立ててやってみる

Docker入れとく。
https://matsuand.github.io/docs.docker.jp.onthefly/engine/install/debian/

$ docker --version
Docker version 27.0.3, build 7d4bcd8

pgxmanを入れる。
https://docs.pgxman.com/installing_pgxman

$ curl -sfL https://install.pgx.sh | sh -
(中略)
Selecting previously unselected package pgxman.
(Reading database ... 70210 files and directories currently installed.)
Preparing to unpack .../pgxman_1.3.1_amd64.deb ...
Unpacking pgxman (1.3.1) ...
Setting up pgxman (1.3.1) ...
@@@@@@@@    @@@@@@@@ @@@      @@@  @@@@@@ @@@@@   @@@@@@    @@@@@@
@@@   @@@  @@@    @@@  @@@@  @@@@   @@  @@@@  @@@ @@@  @@@  @@@  @@@
@@      @@ @@@     @@    @@@@@@     @@   @@    @@ @@@@@@@@  @@    @@
@@@    @@@ @@@     @@   @@@  @@@    @@   @@    @@ @@    @@  @@    @@
@@@@@@@@@   @@@@@@@@@ @@@      @@@  @@   @@    @@ @@    @@  @@    @@
@@                 @@
@@          @@@@@@@@@

👏🎉 pgxman successfully installed.
If this is your first time using pgxman, check out our docs at https://docs.pgxman.com/
Doctor summary:
[x] PostgreSQL is not installed
    To install a PostgreSQL extension, you'll need to install PostgreSQL.
    Visit https://docs.pgxman.com/installing_postgres for more info.
Recommendations:
[✓] Docker is installed
[✓] Docker daemon is running

Doctor found 1 issues.

Dockerは入ってるけど、PostgreSQLが入ってないと言われてる。
ここの手順に従ってPG16をインストール。
https://docs.pgxman.com/installing_postgres

接続確認。

$ sudo -u postgres psql
psql (16.3 (Debian 16.3-1.pgdg120+1))
Type "help" for help.

postgres=# select version();
                                                       version                                                       
---------------------------------------------------------------------------------------------------------------------
 PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

パスワード認証もしといたほうが良い。
https://computingforgeeks.com/how-to-install-postgresql-16-on-debian/

hydraをインストール。
Ubuntu 22.04のみサポートと書いてあるな。
https://columnar.docs.hydra.so/#install-on-linux

$ sudo pgxman install hydra_columnar
The following Debian packages will be installed:
  postgresql-16-pgxman-hydra-columnar=1.1.2
The following Apt repositories will be added or updated:
  pgxman-core
Do you want to continue? [Y/n] 
Installing extensions for PostgreSQL 16...
[✓] hydra_columnar 1.1.2: https://pgx.sh/hydra_columnar
こばこば

Columnar試すまで

まずCREATE EXTENSION

postgres=# create extension columnar;

別DB作って、TAMを確認。

hydra_test=# \dA
List of access methods
   Name   | Type  
----------+-------
 brin     | Index
 btree    | Index
 columnar | Table
 gin      | Index
 gist     | Index
 hash     | Index
 heap     | Table
 spgist   | Index
(8 rows)

デフォルトのTAMがcolumnarになるかと思ったらそうではない?
any database you create manually with CREATE DATABASE, the default table type is heap.

USING columnarでテーブルをつくってあげる。

hydra_test=# CREATE TABLE my_columnar_table
(
        id INT,
        i1 INT,
        i2 INT8,
        n NUMERIC,
        t TEXT
) USING columnar;
CREATE TABLE
hydra_test=# \dt+
                                         List of relations
 Schema |       Name        | Type  |  Owner   | Persistence | Access method | Size  | Description 
--------+-------------------+-------+----------+-------------+---------------+-------+-------------
 public | my_columnar_table | table | postgres | permanent   | columnar      | 16 kB | 
(1 row)

とりあえずデータを100行Insert

hydra_test=# insert into my_columnar_table (id)
hydra_test-# select generate_series(1,100);
INSERT 0 100
hydra_test=# select count(*) from my_columnar_table ;
 count 
-------
   100
(1 row)
こばこば

Hydraの更新と削除

カラムナ形式のまま、更新と削除ができる。実際の行を更新している訳ではなく、古い行を削除してから行を追加している形らしい。

Hydra Columnar tables support updates and deletes, yet remains an append-only datastore. In order to achieve this, Hydra maintains metadata about which rows in the table have been deleted or modified. Modified data is re-written to the end of the table; you can think of an UPDATE as a DELETE followed by an INSERT.

When querying, Hydra will automatically return only the latest version of your data.

https://columnar.docs.hydra.so/concepts/updates-and-deletes

試してみると普通にできる。

hydra_test=# select * from my_columnar_table where id < 10;
 id | i1 | i2 | n | t 
----+----+----+---+---
  1 |    |    |   | 
  2 |    |    |   | 
  3 |    |    |   | 
  4 |    |    |   | 
  5 |    |    |   | 
  6 |    |    |   | 
  7 |    |    |   | 
  8 |    |    |   | 
  9 |    |    |   | 
(9 rows)

hydra_test=# update my_columnar_table set t = 'updated.' where id < 6;
UPDATE 5
hydra_test=# select * from my_columnar_table where id < 10;
 id | i1 | i2 | n |    t     
----+----+----+---+----------
  6 |    |    |   | 
  7 |    |    |   | 
  8 |    |    |   | 
  9 |    |    |   | 
  1 |    |    |   | updated.
  2 |    |    |   | updated.
  3 |    |    |   | updated.
  4 |    |    |   | updated.
  5 |    |    |   | updated.
(9 rows)

ただ、更新と削除でテーブルロックを取るらしい。UPDATEは遅く、削除は比較的速い。

Each updates or deletes query will lock the table, meaning multiple UPDATE or DELETE queries on the same table will be executed serially (one at a time). UPDATE queries rewrite any rows that are modified, and thus are relatively slow. DELETE queries only modify metadata and thus complete quite quickly.

確認してみるとRowExclusiveLock
https://www.postgresql.jp/document/15/html/explicit-locking.html#LOCKING-TABLES
oid 16760がmy_columnar_table

hydra_test=*# update my_columnar_table set t = 'updated.' where id < 6;
UPDATE 5
hydra_test=*# select * from pg_locks;
   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid |   objid    | objsubid | virtualtransaction | pid  |       mode       | granted | fastpath | waitstart 
---------------+----------+----------+------+-------+------------+---------------+---------+------------+----------+--------------------+------+------------------+---------+----------+-----------
 relation      |    16571 |    12073 |      |       |            |               |         |            |          | 4/61               | 7916 | AccessShareLock  | t       | t        | 
 relation      |    16571 |    16760 |      |       |            |               |         |            |          | 4/61               | 7916 | AccessShareLock  | t       | t        | 
 relation      |    16571 |    16760 |      |       |            |               |         |            |          | 4/61               | 7916 | RowExclusiveLock | t       | t        | 
 virtualxid    |          |          |      |       | 4/61       |               |         |            |          | 4/61               | 7916 | ExclusiveLock    | t       | t        | 
 advisory      |    16571 |          |      |       |            |               |       2 | 1410065408 |        1 | 4/61               | 7916 | ExclusiveLock    | t       | f        | 
 transactionid |          |          |      |       |            |           753 |         |            |          | 4/61               | 7916 | ExclusiveLock    | t       | f        | 
(6 rows)
こばこば

Citusの更新と削除

v12(現最新)でもやっぱりInsertのみのサポート。
https://docs.citusdata.com/en/v12.1/admin_guide/table_management.html#limitations

CitusもHydraもパーティションレベルでカラムナを使うことができるので、表形式と列形式が混じったパーティションテーブルにdelete投げた時に、

  • Citus:列形式のデータ範囲を含むupdate/deleteだとエラー
  • Hydra:列形式のデータ範囲を含むupdate/deleteでも実行可能、ただしupdateのレスポンス悪い

となりそう。

こばこば

Citusはマルチノード用にこっちで入れておく。
https://docs.citusdata.com/en/v12.1/installation/multi_node_debian.html#steps-to-be-executed-on-all-nodes

$ sudo pg_conftool 16 main set shared_preload_libraries citus
これのpg_conftoolってなんだろ。Citus製のツール?

パスワード認証はHydra同様に設定しておく。
Citusの拡張も入れる。

$ psql -h localhost -U postgres
postgres=# create extension citus;
CREATE EXTENSION
こばこば

AlloyDB Omniも入れておく

https://zenn.dev/cloud_ace/articles/67d7a66eaba2f0

メモリ要件がe2-microだと無理なので拡張してインストール。

$ sudo alloydb database-server install --data-dir=/home/$USER/alloydb-data
WARNING: Existing data plane configuration file "/var/alloydb/config/dataplane.conf" will be updated with flag values.
To avoid unexpected configuration from prior installations, delete the existing data plane configuration before reinstalling.
Proceed with reinstalling AlloyDB Omni? [N/y] y
Initializing AlloyDB Omni instance...
Creating postgres user...
Copying installation files...
Creating AlloyDB Omni directories...
Installing systemd services...
Updating data plane configuration file...
Starting AlloyDB Omni...
Starting system check for AlloyDB Omni on local machine...

Checking for minimum of 2 vCPUs...SUCCESS
└─Detected 2 vCPUs.
Checking for minimum of 2.00GB of RAM...SUCCESS
└─Detected total RAM: 3.83GB.
Checking for Linux kernel...SUCCESS
└─Linux kernel detected.
Checking Linux kernel version is 4.18+...SUCCESS
└─Linux kernel version: 6.1.0-22-cloud-amd64
Checking Linux distribution is Debian-based or RHEL...SUCCESS
└─Compatible Debian-based distribution detected.
Checking cgroups v2 is enabled...SUCCESS
└─cgroups v2 is enabled.
Checking Docker is installed...SUCCESS
└─Docker installation found.
Checking Docker daemon is running...SUCCESS
└─Docker service is currently active.
Checking Docker server version is 20.10+...SUCCESS
└─Compatible Docker server version: 27.1.1
Checking for conflicting pre-existing users...SUCCESS
└─User postgres exists with expected ID 2345.

System check passed. Starting AlloyDB Omni...
Configuring memory kernel parameters...
Configuring data directory...
Cleaning old pgtmp-loop and swap files...
Setting up core-dump directory...
Setting up swap file...
Starting memory cleanup...
Removing old data plane containers...
Failed to automatically start AlloyDB Omni after installation.
After remediation, use "alloydb database-server start".
ERROR: Error response from daemon: cannot remove container "/pg-service": container is running: stop the container before removing or force remove
Error response from daemon: cannot remove container "/memory-agent": container is running: stop the container before removing or force remove

インストールが完了したら接続確認。

$ sudo docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS     NAMES
2a09a3cb906f   gcr.io/alloydb-omni/memory-agent:15.5.2   "/cgmon --logtostder…"   2 minutes ago   Up 2 minutes             memory-agent
409392c2b26e   gcr.io/alloydb-omni/pg-service:15.5.2     "/bin/bash /smurf-sc…"   2 minutes ago   Up 2 minutes             pg-service

$ sudo docker exec -it pg-service psql -h localhost -U postgres
psql (15.5)
Type "help" for help.

No entry for terminal type "xterm";
using dumb terminal settings.
postgres=# 

色々とalloydb用のユーザやDBが作られている。

postgres=# select version();
                                         version                                         
-----------------------------------------------------------------------------------------
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
(1 row)

postgres=# \du
                                      List of roles
    Role name    |                         Attributes                         | Member of 
-----------------+------------------------------------------------------------+-----------
 alloydbadmin    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 alloydbmetadata |                                                            | {}
 alloydbreplica  | Replication                                                | {}
 postgres        | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# \l 
                                                      List of databases
      Name       |    Owner     | Encoding | Collate | Ctype | ICU Locale | Locale Provider |       Access privileges        
-----------------+--------------+----------+---------+-------+------------+-----------------+--------------------------------
 alloydbadmin    | alloydbadmin | UTF8     | C       | C     | und-x-icu  | icu             | 
 alloydbmetadata | alloydbadmin | UTF8     | C       | C     | und-x-icu  | icu             | alloydbadmin=CTc/alloydbadmin +
                 |              |          |         |       |            |                 | alloydbmetadata=c/alloydbadmin
 postgres        | alloydbadmin | UTF8     | C       | C     | und-x-icu  | icu             | 
 template0       | alloydbadmin | UTF8     | C       | C     | und-x-icu  | icu             | =c/alloydbadmin               +
                 |              |          |         |       |            |                 | alloydbadmin=CTc/alloydbadmin
 template1       | alloydbadmin | UTF8     | C       | C     | und-x-icu  | icu             | =c/alloydbadmin               +
                 |              |          |         |       |            |                 | alloydbadmin=CTc/alloydbadmin
(5 rows)

あとはカラムナの使い方を。

こばこば

alloydb omniでカラムナエンジンを使ってみる

postgresql.confで設定をONにする必要あり。
google_columnar_engine.enabled=on

postgres=# show google_columnar_engine.enabled;
 google_columnar_engine.enabled 
--------------------------------
 on
(1 row)

https://zenn.dev/google_cloud_jp/articles/2a26af12a0ecc9#カラムナーエンジンによる高速分析クエリを試してみる

postgresqlクライアントをローカルに入れて、hammerdbも入れておく。

mkdir hammerdb
cd hammerdb
curl -OL https://github.com/TPC-Council/HammerDB/releases/download/v4.9/HammerDB-4.9-Linux.tar.gz
tar zxvfn HammerDB-4.9-Linux.tar.gz
cd HammerDB-4.9

初期データもセット。

$ psql -h localhost -U postgres -c "CREATE DATABASE mytpch;"

$ cd HammerDB-4.9
$ ./hammerdbcli << EOF
dbset db pg
dbset bm tpc-h
diset connection pg_host localhost
diset connection pg_sslmode disable
diset tpch pg_scale_fact 1
diset tpch pg_tpch_superuserpass postgres
diset tpch pg_tpch_defaultdbase mytpch
diset tpch pg_degree_of_parallel 1
diset tpch pg_num_tpch_threads 4
diset tpch pg_tpch_user postgres
diset tpch pg_tpch_dbase mytpch
buildschema
quit
EOF

データサイズは約2GB。

postgres=# SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database;

     datname     | pg_size_pretty 
-----------------+----------------
 postgres        | 11 MB
 alloydbadmin    | 14 MB
 template1       | 11 MB
 template0       | 9217 kB
 alloydbmetadata | 11 MB
 mytpch          | 2078 MB
(6 rows)
こばこば

投入データを検索してみる

Hydra

カラムナではない時の実行計画。なんかJITのやつ出てるな、これ何だっけ。

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=167868.97..167868.98 rows=1 width=40)
   ->  Gather  (cost=167868.74..167868.96 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=166868.74..166868.76 rows=1 width=40)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..154370.16 rows=2499716 width=14)
 JIT:
   Functions: 5
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=38184.23..38184.24 rows=1 width=40)
   ->  Gather  (cost=38184.00..38184.21 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=37184.00..37184.01 rows=1 width=40)
               ->  Parallel Seq Scan on orders  (cost=0.00..34059.00 rows=625000 width=14)
(5 rows)

カラムナを使った際の実行計画(DBごとにCreate Extensionの必要あり)

mytpch=# create extension columnar;
CREATE EXTENSION
mytpch=# \dx
                  List of installed extensions
   Name   | Version |   Schema   |         Description          
----------+---------+------------+------------------------------
 columnar | 11.1-11 | public     | Hydra Columnar extension
 plpgsql  | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

mytpch=# create table lineitem_col using columnar as (select * from lineitem);
SELECT 5999605
mytpch=# analyze lineitem_col;

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem_col;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5758.94..5758.95 rows=1 width=40)
   ->  Gather  (cost=5758.17..5758.88 rows=7 width=40)
         Workers Planned: 7
         ->  Partial Aggregate  (cost=4758.17..4758.18 rows=1 width=40)
               ->  Parallel Custom Scan (ColumnarScan) on lineitem_col  (cost=0.00..472.74 rows=857086 width=14)
                     Columnar Projected Columns: l_orderkey, l_extendedprice
(6 rows)

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders_col;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2252.65..2252.66 rows=1 width=40)
   ->  Gather  (cost=2251.89..2252.60 rows=7 width=40)
         Workers Planned: 7
         ->  Partial Aggregate  (cost=1251.89..1251.90 rows=1 width=40)
               ->  Parallel Custom Scan (ColumnarScan) on orders_col  (cost=0.00..180.46 rows=214286 width=14)
                     Columnar Projected Columns: o_orderkey, o_totalprice
(6 rows)

サイズ等の情報、データの初期投入がabendしてるかも。

mytpch=# select count(*) from lineitem;
  count  
---------
 5999605
(1 row)

mytpch=# select count(*) from orders;
  count  
---------
 1500000
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
 pg_size_pretty 
----------------
 1501 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
 pg_size_pretty 
----------------
 282 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem_col'));
 pg_size_pretty 
----------------
 208 MB
(1 row)

mytpch=# create table orders_col using columnar as (select * from orders);
SELECT 1500000
mytpch=# analyze orders_col ;
ANALYZE
mytpch=# select pg_size_pretty(pg_total_relation_size('orders_col'));
 pg_size_pretty 
----------------
 45 MB
(1 row)
こばこば

Citusの方も見てみる

初期データの件数確認。

mytpch=# select count(*) from lineitem ;
  count  
---------
 6000664
(1 row)

mytpch=# select count(*) from orders;
  count  
---------
 1500000
(1 row)

カラムナストアを作成してanalyze

mytpch=# create extension citus;
CREATE EXTENSION
mytpch=# create table orders_col using columnar as (select * from orders);
SELECT 1500000
mytpch=# create table lineitem_col using columnar as (select * from lineitem);
SELECT 6000664
mytpch=# analyze;
ANALYZE

実行計画を行ストアと列ストアの両方で確認。

-- 通常の行ストア
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=167885.09..167885.10 rows=1 width=40)
   ->  Gather  (cost=167884.86..167885.07 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=166884.86..166884.87 rows=1 width=40)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..154383.24 rows=2500324 width=14)
 JIT:
   Functions: 5
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=38179.23..38179.24 rows=1 width=40)
   ->  Gather  (cost=38179.00..38179.21 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=37179.00..37179.01 rows=1 width=40)
               ->  Parallel Seq Scan on orders  (cost=0.00..34054.00 rows=625000 width=14)
(5 rows)

-- 列ストア
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem_col;
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Aggregate  (cost=33312.81..33312.82 rows=1 width=40)
   ->  Custom Scan (ColumnarScan) on lineitem_col  (cost=0.00..3309.49 rows=6000664 width=14)
         Columnar Projected Columns: l_orderkey, l_extendedprice
(3 rows)

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders_col;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Aggregate  (cost=8763.87..8763.88 rows=1 width=40)
   ->  Custom Scan (ColumnarScan) on orders_col  (cost=0.00..1263.87 rows=1500000 width=14)
         Columnar Projected Columns: o_orderkey, o_totalprice
(3 rows)

サイズ等の情報を確認。Hydraと全く同じ。同じ圧縮アルゴリズムを使ってる?

mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
 pg_size_pretty 
----------------
 1501 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
 pg_size_pretty 
----------------
 282 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem_col'));
 pg_size_pretty 
----------------
 208 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('orders_col'));
 pg_size_pretty 
----------------
 45 MB
(1 row)
こばこば

AlloyDB Omniも見てみる

まずはレコード件数とサイズから。こっちはCreate Extensionは不要。

mytpch=# select count(*) from lineitem ;
  count  
---------
 5999900
(1 row)

mytpch=# select count(*) from orders ;
  count  
---------
 1500000
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
 pg_size_pretty 
----------------
 1505 MB
(1 row)

mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
 pg_size_pretty 
----------------
 285 MB
(1 row)

実行計画を見てみるが、カラムナキャッシュなので変わりはない。

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=168281.62..168281.63 rows=1 width=40)
   ->  Gather  (cost=168281.39..168281.60 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=167281.39..167281.40 rows=1 width=40)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..154778.92 rows=2500492 width=14)
(5 rows)

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=38518.23..38518.24 rows=1 width=40)
   ->  Gather  (cost=38518.00..38518.21 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=37518.00..37518.01 rows=1 width=40)
               ->  Parallel Seq Scan on orders  (cost=0.00..34393.00 rows=625000 width=14)
(5 rows)

カラムナキャッシュの手動設定前後でexplain analyzeを比較してみる。
確かに高速になっており、Columnar scanなども登場する。

-- カラムナキャッシュが自動の状態(たぶん乗っていない?)のexplain analyze
mytpch=# explain analyze SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=168281.62..168281.63 rows=1 width=40) (actual time=1490.046..1490.632 rows=1 loops=1)
   ->  Gather  (cost=168281.39..168281.60 rows=2 width=40) (actual time=1489.672..1490.611 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=167281.39..167281.40 rows=1 width=40) (actual time=1463.247..1463.249 rows=1 loops=3)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..154778.92 rows=2500492 width=14) (actual time=0.006..490.032 rows=1999967 loops=3)
 Planning Time: 0.184 ms
 Execution Time: 1490.706 ms
(8 rows)

-- カラムナキャッシュを手動で設定してみる
mytpch=# SELECT google_columnar_engine_add(
    relation => 'lineitem',
    columns => 'l_orderkey,l_extendedprice');
 google_columnar_engine_add 
----------------------------
                         92
(1 row)

mytpch=# explain analyze SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=74529.47..74529.48 rows=1 width=40) (actual time=725.176..725.277 rows=1 loops=1)
   ->  Gather  (cost=74529.24..74529.45 rows=2 width=40) (actual time=623.643..725.253 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=73529.24..73529.25 rows=1 width=40) (actual time=202.669..202.672 rows=1 loops=3)
               ->  Parallel Append  (cost=20.00..58528.37 rows=3000173 width=14) (actual time=4.981..202.648 rows=1999967 loops=3)
                     ->  Parallel Custom Scan (columnar scan) on lineitem  (cost=20.00..58519.03 rows=2999950 width=14) (actual time=4.979..202.643 rows=1999967 loops=3)
                           Rows Removed by Columnar Filter: 0
                           Rows Aggregated by Columnar Scan: 1999967
                           Columnar cache search mode: native
                     ->  Parallel Seq Scan on lineitem  (cost=0.00..9.34 rows=223 width=14) (never executed)
 Planning Time: 290.928 ms
 Execution Time: 752.699 ms
(13 rows)

mytpch=# SELECT google_columnar_engine_add(
    relation => 'orders',
    columns => 'o_orderkey,o_totalprice');
 google_columnar_engine_add 
----------------------------
                         35
(1 row)

mytpch=# explain analyze select count(o_orderkey),avg(o_totalprice) from orders;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=19399.24..19399.25 rows=1 width=40) (actual time=186.943..187.069 rows=1 loops=1)
   ->  Gather  (cost=19399.01..19399.22 rows=2 width=40) (actual time=166.497..187.036 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=18399.01..18399.02 rows=1 width=40) (actual time=152.389..152.394 rows=1 loops=3)
               ->  Parallel Append  (cost=20.00..14649.01 rows=750001 width=14) (actual time=0.500..152.355 rows=500000 loops=3)
                     ->  Parallel Custom Scan (columnar scan) on orders  (cost=20.00..14645.00 rows=750000 width=14) (actual time=0.498..152.345 rows=500000 loops=3)
                           Rows Removed by Columnar Filter: 0
                           Rows Aggregated by Columnar Scan: 352914
                           Columnar cache search mode: native
                     ->  Parallel Seq Scan on orders  (cost=0.00..4.00 rows=1 width=14) (never executed)
 Planning Time: 0.340 ms
 Execution Time: 187.181 ms
(13 rows)

google_columnar_engine_add()でカラムナキャッシュを作成した後に、実行計画が変わってることがexplainのみの実行結果で分かる。

mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=19399.24..19399.25 rows=1 width=40)
   ->  Gather  (cost=19399.01..19399.22 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=18399.01..18399.02 rows=1 width=40)
               ->  Parallel Append  (cost=20.00..14649.01 rows=750001 width=14)
                     ->  Parallel Custom Scan (columnar scan) on orders  (cost=20.00..14645.00 rows=750000 width=14)
                           Columnar cache search mode: native
                     ->  Parallel Seq Scan on orders  (cost=0.00..4.00 rows=1 width=14)
(8 rows)

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=74529.47..74529.48 rows=1 width=40)
   ->  Gather  (cost=74529.24..74529.45 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=73529.24..73529.25 rows=1 width=40)
               ->  Parallel Append  (cost=20.00..58528.37 rows=3000173 width=14)
                     ->  Parallel Custom Scan (columnar scan) on lineitem  (cost=20.00..58519.03 rows=2999950 width=14)
                           Columnar cache search mode: native
                     ->  Parallel Seq Scan on lineitem  (cost=0.00..9.34 rows=223 width=14)
(8 rows)

カラムナキャッシュなのでノード再起動すると設定やり直しになっている。
起動時に読み込むような設定もできるのかも。

再起動からやり直してみた例。

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=168281.62..168281.63 rows=1 width=40)
   ->  Gather  (cost=168281.39..168281.60 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=167281.39..167281.40 rows=1 width=40)
               ->  Parallel Seq Scan on lineitem  (cost=0.00..154778.92 rows=2500492 width=14)
(5 rows)

mytpch=# SELECT google_columnar_engine_add(
    relation => 'orders',
    columns => 'o_orderkey,o_totalprice');
 google_columnar_engine_add 
----------------------------
                         35
(1 row)

mytpch=# SELECT google_columnar_engine_add(
    relation => 'lineitem',
    columns => 'l_orderkey,l_extendedprice');
 google_columnar_engine_add 
----------------------------
                         92
(1 row)

mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=74529.47..74529.48 rows=1 width=40)
   ->  Gather  (cost=74529.24..74529.45 rows=2 width=40)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=73529.24..73529.25 rows=1 width=40)
               ->  Parallel Append  (cost=20.00..58528.37 rows=3000173 width=14)
                     ->  Parallel Custom Scan (columnar scan) on lineitem  (cost=20.00..58519.03 rows=2999950 width=14)
                           Columnar cache search mode: native
                     ->  Parallel Seq Scan on lineitem  (cost=0.00..9.34 rows=223 width=14)
(8 rows)
こばこば

Citusのカラムナストアを更新してみる

できないのでエラーになる。
deleteもできないので、全削除⇒入れ直ししかない。

mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey |               l_comment               
--------------+------------+---------------------------------------
            1 |          1 | even accounts are. furiously regular 
(1 row)

-- 通常の行ストア(heap)は当然updateできる
mytpch=# update lineitem set l_comment = 'updated.'  where l_linenumber=1 and l_orderkey=1;
UPDATE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey | l_comment 
--------------+------------+-----------
            1 |          1 | updated.
(1 row)

mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey |               l_comment               
--------------+------------+---------------------------------------
            1 |          1 | even accounts are. furiously regular 
(1 row)

mytpch=# explain select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on lineitem_col  (cost=0.00..4964.24 rows=1 width=39)
   Filter: ((l_linenumber = '1'::numeric) AND (l_orderkey = '1'::numeric))
   Columnar Projected Columns: l_orderkey, l_linenumber, l_comment
(3 rows)

-- カラムナストアの更新はエラーになる
mytpch=# update lineitem_col set l_comment = 'updated.'  where l_linenumber=1 and l_orderkey=1;
ERROR:  UPDATE and CTID scans not supported for ColumnarScan

-- deleteも出来ない
mytpch=# delete from lineitem_col where l_linenumber=1 and l_orderkey=1;
ERROR:  UPDATE and CTID scans not supported for ColumnarScan
こばこば

Hydraのカラムナストアを更新してみる

こっちはできるが、遅いしテーブルロックなので注意。

mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey |                l_comment                
--------------+------------+-----------------------------------------
            1 |          1 | quickly bold dependencies snooze across
(1 row)

mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey |                l_comment                
--------------+------------+-----------------------------------------
            1 |          1 | quickly bold dependencies snooze across
(1 row)

mytpch=# explain select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1709.21 rows=1 width=40)
   Workers Planned: 7
   ->  Parallel Custom Scan (ColumnarScan) on lineitem_col  (cost=0.00..709.11 rows=857086 width=40)
         Filter: ((l_linenumber = '1'::numeric) AND (l_orderkey = '1'::numeric))
         Columnar Projected Columns: l_orderkey, l_linenumber, l_comment
(5 rows)

-- updateは出来る
mytpch=# update lineitem_col set l_comment = 'updated.'  where l_linenumber=1 and l_orderkey=1;
UPDATE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey | l_comment 
--------------+------------+-----------
            1 |          1 | updated.
(1 row)

-- deleteも出来る
mytpch=# delete from lineitem_col where l_linenumber=1 and l_orderkey=1;
DELETE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
 l_linenumber | l_orderkey | l_comment 
--------------+------------+-----------
(0 rows)