HydraやCitusを試す

Hydra
インストールするのにpgxmanを使っている。
dockerでpgxman出来ないかを試す。
イメージ名が違う、またはpublicなやつは公開されてない?
$ docker pull owenthereal/pgxman-runner-postgres-16
linux/arm64のimageなので、WSLで動かなかった。

Google CloudにVM立ててやってみる
Docker入れとく。
$ docker --version
Docker version 27.0.3, build 7d4bcd8
pgxmanを入れる。
$ curl -sfL https://install.pgx.sh | sh -
(中略)
Selecting previously unselected package pgxman.
(Reading database ... 70210 files and directories currently installed.)
Preparing to unpack .../pgxman_1.3.1_amd64.deb ...
Unpacking pgxman (1.3.1) ...
Setting up pgxman (1.3.1) ...
@@@@@@@@ @@@@@@@@ @@@ @@@ @@@@@@ @@@@@ @@@@@@ @@@@@@
@@@ @@@ @@@ @@@ @@@@ @@@@ @@ @@@@ @@@ @@@ @@@ @@@ @@@
@@ @@ @@@ @@ @@@@@@ @@ @@ @@ @@@@@@@@ @@ @@
@@@ @@@ @@@ @@ @@@ @@@ @@ @@ @@ @@ @@ @@ @@
@@@@@@@@@ @@@@@@@@@ @@@ @@@ @@ @@ @@ @@ @@ @@ @@
@@ @@
@@ @@@@@@@@@
👏🎉 pgxman successfully installed.
If this is your first time using pgxman, check out our docs at https://docs.pgxman.com/
Doctor summary:
[x] PostgreSQL is not installed
To install a PostgreSQL extension, you'll need to install PostgreSQL.
Visit https://docs.pgxman.com/installing_postgres for more info.
Recommendations:
[✓] Docker is installed
[✓] Docker daemon is running
Doctor found 1 issues.
Dockerは入ってるけど、PostgreSQLが入ってないと言われてる。
ここの手順に従ってPG16をインストール。
接続確認。
$ sudo -u postgres psql
psql (16.3 (Debian 16.3-1.pgdg120+1))
Type "help" for help.
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)
パスワード認証もしといたほうが良い。
hydraをインストール。
Ubuntu 22.04のみサポートと書いてあるな。
$ sudo pgxman install hydra_columnar
The following Debian packages will be installed:
postgresql-16-pgxman-hydra-columnar=1.1.2
The following Apt repositories will be added or updated:
pgxman-core
Do you want to continue? [Y/n]
Installing extensions for PostgreSQL 16...
[✓] hydra_columnar 1.1.2: https://pgx.sh/hydra_columnar

Columnar試すまで
まずCREATE EXTENSION
postgres=# create extension columnar;
別DB作って、TAMを確認。
hydra_test=# \dA
List of access methods
Name | Type
----------+-------
brin | Index
btree | Index
columnar | Table
gin | Index
gist | Index
hash | Index
heap | Table
spgist | Index
(8 rows)
デフォルトのTAMがcolumnarになるかと思ったらそうではない?
any database you create manually with CREATE DATABASE, the default table type is heap.
USING columnarでテーブルをつくってあげる。
hydra_test=# CREATE TABLE my_columnar_table
(
id INT,
i1 INT,
i2 INT8,
n NUMERIC,
t TEXT
) USING columnar;
CREATE TABLE
hydra_test=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------------------+-------+----------+-------------+---------------+-------+-------------
public | my_columnar_table | table | postgres | permanent | columnar | 16 kB |
(1 row)
とりあえずデータを100行Insert
hydra_test=# insert into my_columnar_table (id)
hydra_test-# select generate_series(1,100);
INSERT 0 100
hydra_test=# select count(*) from my_columnar_table ;
count
-------
100
(1 row)

Hydraの更新と削除
カラムナ形式のまま、更新と削除ができる。実際の行を更新している訳ではなく、古い行を削除してから行を追加している形らしい。
Hydra Columnar tables support updates and deletes, yet remains an append-only datastore. In order to achieve this, Hydra maintains metadata about which rows in the table have been deleted or modified. Modified data is re-written to the end of the table; you can think of an UPDATE as a DELETE followed by an INSERT.
When querying, Hydra will automatically return only the latest version of your data.
試してみると普通にできる。
hydra_test=# select * from my_columnar_table where id < 10;
id | i1 | i2 | n | t
----+----+----+---+---
1 | | | |
2 | | | |
3 | | | |
4 | | | |
5 | | | |
6 | | | |
7 | | | |
8 | | | |
9 | | | |
(9 rows)
hydra_test=# update my_columnar_table set t = 'updated.' where id < 6;
UPDATE 5
hydra_test=# select * from my_columnar_table where id < 10;
id | i1 | i2 | n | t
----+----+----+---+----------
6 | | | |
7 | | | |
8 | | | |
9 | | | |
1 | | | | updated.
2 | | | | updated.
3 | | | | updated.
4 | | | | updated.
5 | | | | updated.
(9 rows)
ただ、更新と削除でテーブルロックを取るらしい。UPDATEは遅く、削除は比較的速い。
Each updates or deletes query will lock the table, meaning multiple UPDATE or DELETE queries on the same table will be executed serially (one at a time). UPDATE queries rewrite any rows that are modified, and thus are relatively slow. DELETE queries only modify metadata and thus complete quite quickly.
確認してみるとRowExclusiveLock
oid 16760がmy_columnar_tablehydra_test=*# update my_columnar_table set t = 'updated.' where id < 6;
UPDATE 5
hydra_test=*# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart
---------------+----------+----------+------+-------+------------+---------------+---------+------------+----------+--------------------+------+------------------+---------+----------+-----------
relation | 16571 | 12073 | | | | | | | | 4/61 | 7916 | AccessShareLock | t | t |
relation | 16571 | 16760 | | | | | | | | 4/61 | 7916 | AccessShareLock | t | t |
relation | 16571 | 16760 | | | | | | | | 4/61 | 7916 | RowExclusiveLock | t | t |
virtualxid | | | | | 4/61 | | | | | 4/61 | 7916 | ExclusiveLock | t | t |
advisory | 16571 | | | | | | 2 | 1410065408 | 1 | 4/61 | 7916 | ExclusiveLock | t | f |
transactionid | | | | | | 753 | | | | 4/61 | 7916 | ExclusiveLock | t | f |
(6 rows)

Citusの更新と削除
v12(現最新)でもやっぱりInsertのみのサポート。
CitusもHydraもパーティションレベルでカラムナを使うことができるので、表形式と列形式が混じったパーティションテーブルにdelete投げた時に、
- Citus:列形式のデータ範囲を含むupdate/deleteだとエラー
- Hydra:列形式のデータ範囲を含むupdate/deleteでも実行可能、ただしupdateのレスポンス悪い
となりそう。

Citusはマルチノード用にこっちで入れておく。
$ sudo pg_conftool 16 main set shared_preload_libraries citus
これのpg_conftoolってなんだろ。Citus製のツール?
パスワード認証はHydra同様に設定しておく。
Citusの拡張も入れる。
$ psql -h localhost -U postgres
postgres=# create extension citus;
CREATE EXTENSION

AlloyDB Omniも入れておく
メモリ要件がe2-microだと無理なので拡張してインストール。
$ sudo alloydb database-server install --data-dir=/home/$USER/alloydb-data
WARNING: Existing data plane configuration file "/var/alloydb/config/dataplane.conf" will be updated with flag values.
To avoid unexpected configuration from prior installations, delete the existing data plane configuration before reinstalling.
Proceed with reinstalling AlloyDB Omni? [N/y] y
Initializing AlloyDB Omni instance...
Creating postgres user...
Copying installation files...
Creating AlloyDB Omni directories...
Installing systemd services...
Updating data plane configuration file...
Starting AlloyDB Omni...
Starting system check for AlloyDB Omni on local machine...
Checking for minimum of 2 vCPUs...SUCCESS
└─Detected 2 vCPUs.
Checking for minimum of 2.00GB of RAM...SUCCESS
└─Detected total RAM: 3.83GB.
Checking for Linux kernel...SUCCESS
└─Linux kernel detected.
Checking Linux kernel version is 4.18+...SUCCESS
└─Linux kernel version: 6.1.0-22-cloud-amd64
Checking Linux distribution is Debian-based or RHEL...SUCCESS
└─Compatible Debian-based distribution detected.
Checking cgroups v2 is enabled...SUCCESS
└─cgroups v2 is enabled.
Checking Docker is installed...SUCCESS
└─Docker installation found.
Checking Docker daemon is running...SUCCESS
└─Docker service is currently active.
Checking Docker server version is 20.10+...SUCCESS
└─Compatible Docker server version: 27.1.1
Checking for conflicting pre-existing users...SUCCESS
└─User postgres exists with expected ID 2345.
System check passed. Starting AlloyDB Omni...
Configuring memory kernel parameters...
Configuring data directory...
Cleaning old pgtmp-loop and swap files...
Setting up core-dump directory...
Setting up swap file...
Starting memory cleanup...
Removing old data plane containers...
Failed to automatically start AlloyDB Omni after installation.
After remediation, use "alloydb database-server start".
ERROR: Error response from daemon: cannot remove container "/pg-service": container is running: stop the container before removing or force remove
Error response from daemon: cannot remove container "/memory-agent": container is running: stop the container before removing or force remove
インストールが完了したら接続確認。
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2a09a3cb906f gcr.io/alloydb-omni/memory-agent:15.5.2 "/cgmon --logtostder…" 2 minutes ago Up 2 minutes memory-agent
409392c2b26e gcr.io/alloydb-omni/pg-service:15.5.2 "/bin/bash /smurf-sc…" 2 minutes ago Up 2 minutes pg-service
$ sudo docker exec -it pg-service psql -h localhost -U postgres
psql (15.5)
Type "help" for help.
No entry for terminal type "xterm";
using dumb terminal settings.
postgres=#
色々とalloydb用のユーザやDBが作られている。
postgres=# select version();
version
-----------------------------------------------------------------------------------------
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
(1 row)
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------------+------------------------------------------------------------+-----------
alloydbadmin | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
alloydbmetadata | | {}
alloydbreplica | Replication | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------------+--------------+----------+---------+-------+------------+-----------------+--------------------------------
alloydbadmin | alloydbadmin | UTF8 | C | C | und-x-icu | icu |
alloydbmetadata | alloydbadmin | UTF8 | C | C | und-x-icu | icu | alloydbadmin=CTc/alloydbadmin +
| | | | | | | alloydbmetadata=c/alloydbadmin
postgres | alloydbadmin | UTF8 | C | C | und-x-icu | icu |
template0 | alloydbadmin | UTF8 | C | C | und-x-icu | icu | =c/alloydbadmin +
| | | | | | | alloydbadmin=CTc/alloydbadmin
template1 | alloydbadmin | UTF8 | C | C | und-x-icu | icu | =c/alloydbadmin +
| | | | | | | alloydbadmin=CTc/alloydbadmin
(5 rows)
あとはカラムナの使い方を。

alloydb omniでカラムナエンジンを使ってみる
postgresql.confで設定をONにする必要あり。
google_columnar_engine.enabled=on
postgres=# show google_columnar_engine.enabled;
google_columnar_engine.enabled
--------------------------------
on
(1 row)
postgresqlクライアントをローカルに入れて、hammerdbも入れておく。
mkdir hammerdb
cd hammerdb
curl -OL https://github.com/TPC-Council/HammerDB/releases/download/v4.9/HammerDB-4.9-Linux.tar.gz
tar zxvfn HammerDB-4.9-Linux.tar.gz
cd HammerDB-4.9
初期データもセット。
$ psql -h localhost -U postgres -c "CREATE DATABASE mytpch;"
$ cd HammerDB-4.9
$ ./hammerdbcli << EOF
dbset db pg
dbset bm tpc-h
diset connection pg_host localhost
diset connection pg_sslmode disable
diset tpch pg_scale_fact 1
diset tpch pg_tpch_superuserpass postgres
diset tpch pg_tpch_defaultdbase mytpch
diset tpch pg_degree_of_parallel 1
diset tpch pg_num_tpch_threads 4
diset tpch pg_tpch_user postgres
diset tpch pg_tpch_dbase mytpch
buildschema
quit
EOF
データサイズは約2GB。
postgres=# SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database;
datname | pg_size_pretty
-----------------+----------------
postgres | 11 MB
alloydbadmin | 14 MB
template1 | 11 MB
template0 | 9217 kB
alloydbmetadata | 11 MB
mytpch | 2078 MB
(6 rows)

投入データを検索してみる
Hydra
カラムナではない時の実行計画。なんかJITのやつ出てるな、これ何だっけ。
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Finalize Aggregate (cost=167868.97..167868.98 rows=1 width=40)
-> Gather (cost=167868.74..167868.96 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=166868.74..166868.76 rows=1 width=40)
-> Parallel Seq Scan on lineitem (cost=0.00..154370.16 rows=2499716 width=14)
JIT:
Functions: 5
Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=38184.23..38184.24 rows=1 width=40)
-> Gather (cost=38184.00..38184.21 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=37184.00..37184.01 rows=1 width=40)
-> Parallel Seq Scan on orders (cost=0.00..34059.00 rows=625000 width=14)
(5 rows)
カラムナを使った際の実行計画(DBごとにCreate Extensionの必要あり)
mytpch=# create extension columnar;
CREATE EXTENSION
mytpch=# \dx
List of installed extensions
Name | Version | Schema | Description
----------+---------+------------+------------------------------
columnar | 11.1-11 | public | Hydra Columnar extension
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
mytpch=# create table lineitem_col using columnar as (select * from lineitem);
SELECT 5999605
mytpch=# analyze lineitem_col;
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem_col;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=5758.94..5758.95 rows=1 width=40)
-> Gather (cost=5758.17..5758.88 rows=7 width=40)
Workers Planned: 7
-> Partial Aggregate (cost=4758.17..4758.18 rows=1 width=40)
-> Parallel Custom Scan (ColumnarScan) on lineitem_col (cost=0.00..472.74 rows=857086 width=14)
Columnar Projected Columns: l_orderkey, l_extendedprice
(6 rows)
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders_col;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2252.65..2252.66 rows=1 width=40)
-> Gather (cost=2251.89..2252.60 rows=7 width=40)
Workers Planned: 7
-> Partial Aggregate (cost=1251.89..1251.90 rows=1 width=40)
-> Parallel Custom Scan (ColumnarScan) on orders_col (cost=0.00..180.46 rows=214286 width=14)
Columnar Projected Columns: o_orderkey, o_totalprice
(6 rows)
サイズ等の情報、データの初期投入がabendしてるかも。
mytpch=# select count(*) from lineitem;
count
---------
5999605
(1 row)
mytpch=# select count(*) from orders;
count
---------
1500000
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
pg_size_pretty
----------------
1501 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
pg_size_pretty
----------------
282 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem_col'));
pg_size_pretty
----------------
208 MB
(1 row)
mytpch=# create table orders_col using columnar as (select * from orders);
SELECT 1500000
mytpch=# analyze orders_col ;
ANALYZE
mytpch=# select pg_size_pretty(pg_total_relation_size('orders_col'));
pg_size_pretty
----------------
45 MB
(1 row)

Citusの方も見てみる
初期データの件数確認。
mytpch=# select count(*) from lineitem ;
count
---------
6000664
(1 row)
mytpch=# select count(*) from orders;
count
---------
1500000
(1 row)
カラムナストアを作成してanalyze
mytpch=# create extension citus;
CREATE EXTENSION
mytpch=# create table orders_col using columnar as (select * from orders);
SELECT 1500000
mytpch=# create table lineitem_col using columnar as (select * from lineitem);
SELECT 6000664
mytpch=# analyze;
ANALYZE
実行計画を行ストアと列ストアの両方で確認。
-- 通常の行ストア
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Finalize Aggregate (cost=167885.09..167885.10 rows=1 width=40)
-> Gather (cost=167884.86..167885.07 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=166884.86..166884.87 rows=1 width=40)
-> Parallel Seq Scan on lineitem (cost=0.00..154383.24 rows=2500324 width=14)
JIT:
Functions: 5
Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=38179.23..38179.24 rows=1 width=40)
-> Gather (cost=38179.00..38179.21 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=37179.00..37179.01 rows=1 width=40)
-> Parallel Seq Scan on orders (cost=0.00..34054.00 rows=625000 width=14)
(5 rows)
-- 列ストア
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem_col;
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=33312.81..33312.82 rows=1 width=40)
-> Custom Scan (ColumnarScan) on lineitem_col (cost=0.00..3309.49 rows=6000664 width=14)
Columnar Projected Columns: l_orderkey, l_extendedprice
(3 rows)
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders_col;
QUERY PLAN
--------------------------------------------------------------------------------------------
Aggregate (cost=8763.87..8763.88 rows=1 width=40)
-> Custom Scan (ColumnarScan) on orders_col (cost=0.00..1263.87 rows=1500000 width=14)
Columnar Projected Columns: o_orderkey, o_totalprice
(3 rows)
サイズ等の情報を確認。Hydraと全く同じ。同じ圧縮アルゴリズムを使ってる?
mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
pg_size_pretty
----------------
1501 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
pg_size_pretty
----------------
282 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem_col'));
pg_size_pretty
----------------
208 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('orders_col'));
pg_size_pretty
----------------
45 MB
(1 row)

AlloyDB Omniも見てみる
まずはレコード件数とサイズから。こっちはCreate Extensionは不要。
mytpch=# select count(*) from lineitem ;
count
---------
5999900
(1 row)
mytpch=# select count(*) from orders ;
count
---------
1500000
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('lineitem'));
pg_size_pretty
----------------
1505 MB
(1 row)
mytpch=# select pg_size_pretty(pg_total_relation_size('orders'));
pg_size_pretty
----------------
285 MB
(1 row)
実行計画を見てみるが、カラムナキャッシュなので変わりはない。
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Finalize Aggregate (cost=168281.62..168281.63 rows=1 width=40)
-> Gather (cost=168281.39..168281.60 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=167281.39..167281.40 rows=1 width=40)
-> Parallel Seq Scan on lineitem (cost=0.00..154778.92 rows=2500492 width=14)
(5 rows)
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=38518.23..38518.24 rows=1 width=40)
-> Gather (cost=38518.00..38518.21 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=37518.00..37518.01 rows=1 width=40)
-> Parallel Seq Scan on orders (cost=0.00..34393.00 rows=625000 width=14)
(5 rows)
カラムナキャッシュの手動設定前後でexplain analyzeを比較してみる。
確かに高速になっており、Columnar scanなども登場する。
-- カラムナキャッシュが自動の状態(たぶん乗っていない?)のexplain analyze
mytpch=# explain analyze SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=168281.62..168281.63 rows=1 width=40) (actual time=1490.046..1490.632 rows=1 loops=1)
-> Gather (cost=168281.39..168281.60 rows=2 width=40) (actual time=1489.672..1490.611 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=167281.39..167281.40 rows=1 width=40) (actual time=1463.247..1463.249 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..154778.92 rows=2500492 width=14) (actual time=0.006..490.032 rows=1999967 loops=3)
Planning Time: 0.184 ms
Execution Time: 1490.706 ms
(8 rows)
-- カラムナキャッシュを手動で設定してみる
mytpch=# SELECT google_columnar_engine_add(
relation => 'lineitem',
columns => 'l_orderkey,l_extendedprice');
google_columnar_engine_add
----------------------------
92
(1 row)
mytpch=# explain analyze SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=74529.47..74529.48 rows=1 width=40) (actual time=725.176..725.277 rows=1 loops=1)
-> Gather (cost=74529.24..74529.45 rows=2 width=40) (actual time=623.643..725.253 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=73529.24..73529.25 rows=1 width=40) (actual time=202.669..202.672 rows=1 loops=3)
-> Parallel Append (cost=20.00..58528.37 rows=3000173 width=14) (actual time=4.981..202.648 rows=1999967 loops=3)
-> Parallel Custom Scan (columnar scan) on lineitem (cost=20.00..58519.03 rows=2999950 width=14) (actual time=4.979..202.643 rows=1999967 loops=3)
Rows Removed by Columnar Filter: 0
Rows Aggregated by Columnar Scan: 1999967
Columnar cache search mode: native
-> Parallel Seq Scan on lineitem (cost=0.00..9.34 rows=223 width=14) (never executed)
Planning Time: 290.928 ms
Execution Time: 752.699 ms
(13 rows)
mytpch=# SELECT google_columnar_engine_add(
relation => 'orders',
columns => 'o_orderkey,o_totalprice');
google_columnar_engine_add
----------------------------
35
(1 row)
mytpch=# explain analyze select count(o_orderkey),avg(o_totalprice) from orders;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=19399.24..19399.25 rows=1 width=40) (actual time=186.943..187.069 rows=1 loops=1)
-> Gather (cost=19399.01..19399.22 rows=2 width=40) (actual time=166.497..187.036 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=18399.01..18399.02 rows=1 width=40) (actual time=152.389..152.394 rows=1 loops=3)
-> Parallel Append (cost=20.00..14649.01 rows=750001 width=14) (actual time=0.500..152.355 rows=500000 loops=3)
-> Parallel Custom Scan (columnar scan) on orders (cost=20.00..14645.00 rows=750000 width=14) (actual time=0.498..152.345 rows=500000 loops=3)
Rows Removed by Columnar Filter: 0
Rows Aggregated by Columnar Scan: 352914
Columnar cache search mode: native
-> Parallel Seq Scan on orders (cost=0.00..4.00 rows=1 width=14) (never executed)
Planning Time: 0.340 ms
Execution Time: 187.181 ms
(13 rows)
google_columnar_engine_add()でカラムナキャッシュを作成した後に、実行計画が変わってることがexplainのみの実行結果で分かる。
mytpch=# explain select count(o_orderkey),avg(o_totalprice) from orders;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=19399.24..19399.25 rows=1 width=40)
-> Gather (cost=19399.01..19399.22 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=18399.01..18399.02 rows=1 width=40)
-> Parallel Append (cost=20.00..14649.01 rows=750001 width=14)
-> Parallel Custom Scan (columnar scan) on orders (cost=20.00..14645.00 rows=750000 width=14)
Columnar cache search mode: native
-> Parallel Seq Scan on orders (cost=0.00..4.00 rows=1 width=14)
(8 rows)
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=74529.47..74529.48 rows=1 width=40)
-> Gather (cost=74529.24..74529.45 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=73529.24..73529.25 rows=1 width=40)
-> Parallel Append (cost=20.00..58528.37 rows=3000173 width=14)
-> Parallel Custom Scan (columnar scan) on lineitem (cost=20.00..58519.03 rows=2999950 width=14)
Columnar cache search mode: native
-> Parallel Seq Scan on lineitem (cost=0.00..9.34 rows=223 width=14)
(8 rows)
カラムナキャッシュなのでノード再起動すると設定やり直しになっている。
起動時に読み込むような設定もできるのかも。
再起動からやり直してみた例。
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Finalize Aggregate (cost=168281.62..168281.63 rows=1 width=40)
-> Gather (cost=168281.39..168281.60 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=167281.39..167281.40 rows=1 width=40)
-> Parallel Seq Scan on lineitem (cost=0.00..154778.92 rows=2500492 width=14)
(5 rows)
mytpch=# SELECT google_columnar_engine_add(
relation => 'orders',
columns => 'o_orderkey,o_totalprice');
google_columnar_engine_add
----------------------------
35
(1 row)
mytpch=# SELECT google_columnar_engine_add(
relation => 'lineitem',
columns => 'l_orderkey,l_extendedprice');
google_columnar_engine_add
----------------------------
92
(1 row)
mytpch=# explain SELECT COUNT(l_orderkey), AVG(l_extendedprice) FROM lineitem;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=74529.47..74529.48 rows=1 width=40)
-> Gather (cost=74529.24..74529.45 rows=2 width=40)
Workers Planned: 2
-> Partial Aggregate (cost=73529.24..73529.25 rows=1 width=40)
-> Parallel Append (cost=20.00..58528.37 rows=3000173 width=14)
-> Parallel Custom Scan (columnar scan) on lineitem (cost=20.00..58519.03 rows=2999950 width=14)
Columnar cache search mode: native
-> Parallel Seq Scan on lineitem (cost=0.00..9.34 rows=223 width=14)
(8 rows)

Citusのカラムナストアを更新してみる
できないのでエラーになる。
deleteもできないので、全削除⇒入れ直ししかない。
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+---------------------------------------
1 | 1 | even accounts are. furiously regular
(1 row)
-- 通常の行ストア(heap)は当然updateできる
mytpch=# update lineitem set l_comment = 'updated.' where l_linenumber=1 and l_orderkey=1;
UPDATE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+-----------
1 | 1 | updated.
(1 row)
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+---------------------------------------
1 | 1 | even accounts are. furiously regular
(1 row)
mytpch=# explain select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
QUERY PLAN
----------------------------------------------------------------------------------
Custom Scan (ColumnarScan) on lineitem_col (cost=0.00..4964.24 rows=1 width=39)
Filter: ((l_linenumber = '1'::numeric) AND (l_orderkey = '1'::numeric))
Columnar Projected Columns: l_orderkey, l_linenumber, l_comment
(3 rows)
-- カラムナストアの更新はエラーになる
mytpch=# update lineitem_col set l_comment = 'updated.' where l_linenumber=1 and l_orderkey=1;
ERROR: UPDATE and CTID scans not supported for ColumnarScan
-- deleteも出来ない
mytpch=# delete from lineitem_col where l_linenumber=1 and l_orderkey=1;
ERROR: UPDATE and CTID scans not supported for ColumnarScan

Hydraのカラムナストアを更新してみる
こっちはできるが、遅いしテーブルロックなので注意。
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+-----------------------------------------
1 | 1 | quickly bold dependencies snooze across
(1 row)
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+-----------------------------------------
1 | 1 | quickly bold dependencies snooze across
(1 row)
mytpch=# explain select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1709.21 rows=1 width=40)
Workers Planned: 7
-> Parallel Custom Scan (ColumnarScan) on lineitem_col (cost=0.00..709.11 rows=857086 width=40)
Filter: ((l_linenumber = '1'::numeric) AND (l_orderkey = '1'::numeric))
Columnar Projected Columns: l_orderkey, l_linenumber, l_comment
(5 rows)
-- updateは出来る
mytpch=# update lineitem_col set l_comment = 'updated.' where l_linenumber=1 and l_orderkey=1;
UPDATE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+-----------
1 | 1 | updated.
(1 row)
-- deleteも出来る
mytpch=# delete from lineitem_col where l_linenumber=1 and l_orderkey=1;
DELETE 1
mytpch=# select l_linenumber,l_orderkey,l_comment from lineitem_col where l_linenumber=1 and l_orderkey=1;
l_linenumber | l_orderkey | l_comment
--------------+------------+-----------
(0 rows)