🐘

PostgreSQLで簡単なif文関数を作るなら

FORCIA Tech Blog 運営チーム

2022/06/21に公開

PostgreSQL

SQL

tech

タイトル : PostgreSQLで簡単なif文関数を作るなら
キーワード: エンジニア、テクノロジー、PostgreSQL

こんにちは、エンジニアの羽間です。

フォルシアではDBにPostgreSQLを利用しており、業務でSQLを書く機会がよくあります。

SQLを書く上では

ビジネスロジックはできるだけ単体テストを書く
SQLの見通しを良くする
同じ処理は共通化する

といったことを心がけ、ユーザー定義関数を積極的に作成しています。

商品の検索といった速度を重視する処理においては、処理速度に優れるC言語関数を作成することが多いのですが、
C言語はちょっとした処理を書きたいときに不便です。メモリの扱いを間違えるとPostgreSQLサーバーがクラッシュすることもあり、危険でもあります。

そこでコーディングのし易さ、保守性を重視して、C言語以外で関数を書くこともあるのですが、選択肢がいくつかあります。
代表的なもので、SQL、PL/pgSQL、PL/v8、PL/Pythonなどです。

どれを選択すべきかは関数で定義する処理内容にも依るのですが、簡単なif文のケースであればどの選択肢を選ぶとよいか、
処理速度観点で、簡単に検証してみました。

測定準備

測定に用いたPostgreSQLのバージョンは14.3で、Docker HubにあるPostgreSQL公式イメージを利用しています。
plv8のバージョンは2.3.15を利用しています。※執筆時点最新の3.1.2がPostgreSQL 14.3でbuildできなかったため。

参考までに、使用したDockerfile、docker-compose.ymlを載せておきます。

FROM postgres:14.3

# plv8のインストール
ENV PLV8_VERSION=2.3.15 \
    PLV8_SHASUM="8a05f9d609bb79e47b91ebc03ea63b3f7826fa421a0ee8221ee21581d68cb5ba"

RUN plv8BuildDependencies="build-essential \
    ca-certificates \
    curl \
    git-core \
    python \
    gpp \
    cpp \
    pkg-config \
    apt-transport-https \
    cmake \
    libc++-dev \
    libc++abi-dev \
    postgresql-server-dev-$PG_MAJOR \
    wget \
    ninja-build" \
  && plv8RuntimeDependencies="libc++1 \
    libtinfo5 \
    libc++abi1" \
  && apt-get update \
  && apt-get install -y --no-install-recommends ${plv8BuildDependencies} ${plv8RuntimeDependencies} \
  && mkdir -p /tmp/build \
  && curl -o /tmp/build/v$PLV8_VERSION.tar.gz -SL "https://github.com/plv8/plv8/archive/v${PLV8_VERSION}.tar.gz" \
  && cd /tmp/build \
  && echo $PLV8_SHASUM v$PLV8_VERSION.tar.gz | sha256sum -c \
  && tar -xzf /tmp/build/v$PLV8_VERSION.tar.gz -C /tmp/build/ \
  && cd /tmp/build/plv8-$PLV8_VERSION \
  && make \
  && make install \
  && strip /usr/lib/postgresql/${PG_MAJOR}/lib/plv8-${PLV8_VERSION}.so \
  && rm -rf /root/.vpython_cipd_cache /root/.vpython-root \
  && apt-get clean \
  && apt-get remove -y ${plv8BuildDependencies} \
  && apt-get autoremove -y \
  && rm -rf /tmp/build /var/lib/apt/lists/*

# plpython3uのインストール
RUN plpythonRuntimeDependencies="python3 \
    postgresql-plpython3-$PG_MAJOR" \
  && apt-get update \
  && apt-get install -y --no-install-recommends ${plpythonRuntimeDependencies} \
  && apt-get clean \
  && apt-get autoremove -y \
  && rm -rf /tmp/build /var/lib/apt/lists/*

# hyperfineのインストール
RUN apt-get update \
  && apt-get install -y --no-install-recommends wget \
  && apt-get clean \
  && wget https://github.com/sharkdp/hyperfine/releases/download/v1.13.0/hyperfine_1.13.0_amd64.deb \
  && dpkg -i hyperfine_1.13.0_amd64.deb \
  && apt-get autoremove -y wget \
  && rm -rf /tmp/build /var/lib/apt/lists/*

version: '3'

services:
  postgres:
    container_name: test_postgres
    restart: always
    build:
      context: ./
      dockerfile: Dockerfile
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      PGPASSWORD: postgres
      POSTGRES_DB: postgres
      POSTGRES_INITDB_ARGS: --encoding=UTF-8
      TZ: "Asia/Tokyo"
    ports:
      - 65433:5432
    volumes:
      - postgres:/var/lib/postgresql/data

volumes:
  postgres:

testdbという名前のDBを作成し、testdbに対して以下のようなSQLを実行します。
100万件のテストデータを作成し、各関数を定義します。

createdb testdb

-- plv8, plpythonのinstall
create extension plv8;
create extension plpython3u;

-- 100万件のデータを用意
create table test_data as
select 
	seq
	,'ja' as language_code 
from
	(select generate_series(1,1000000)::int4 as seq)s
;
analyze test_data;

---関数の定義
-- SQL
create or replace function _normalize_display_station_sqlfunc(language_code text) returns text as 
$$ 
	select case when  language_code = 'ja'  then '駅' else ' station' end; 
$$ language sql immutable parallel safe;

--PL/pgsql
create or replace function _normalize_display_station_plpgsql(language_code text) returns text as
$$
	begin
		if
			language_code = 'ja' then return '駅';
			-- 外国語はスペースをいれる
			else return ' Station';
		end if;
	end;
$$ language plpgsql immutable parallel safe;

--PL/v8
create or replace function _normalize_display_station_plv8(language_code text) returns text as
$$
	if (language_code === 'ja') {
		return '駅';
	} else {
		return ' Station';	
	}
$$ language plv8 immutable parallel safe;

--PL/python3u
create or replace function _normalize_display_station_plpython3u(language_code text) returns text as
$$
	if language_code == 'ja':
		return '駅'
	else:
		return ' Station';	
$$ language plpython3u immutable parallel safe;

測定

hyperfineというベンチマーク用のツールを用いて計測します。
100万回関数をコールするようなSQLを、10回実行し、平均を取ります。

hyperfine --warmup 5 --runs 10 'psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_sqlfunc(language_code) from test_data"'
hyperfine --warmup 5 --runs 10 'psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plpgsql(language_code) from test_data"'
hyperfine --warmup 5 --runs 10 'psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plv8(language_code) from test_data"'
hyperfine --warmup 5 --runs 10 'psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plpython3u(language_code) from test_data"'

結果

Benchmark 1: psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_sqlfunc(language_code) from test_data"
  Time (mean ± σ):     144.5 ms ±  14.1 ms    [User: 16.5 ms, System: 5.7 ms]
  Range (min … max):   130.4 ms … 170.2 ms    10 runs

Benchmark 1: psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plpgsql(language_code) from test_data"
  Time (mean ± σ):     285.9 ms ±  13.9 ms    [User: 17.1 ms, System: 3.3 ms]
  Range (min … max):   273.6 ms … 319.3 ms    10 runs

Benchmark 1: psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plv8(language_code) from test_data"
  Time (mean ± σ):     541.5 ms ±  10.9 ms    [User: 15.1 ms, System: 6.1 ms]
  Range (min … max):   522.7 ms … 550.6 ms    10 runs

Benchmark 1: psql -U postgres -d testdb -c "explain analyze select _normalize_display_station_plpython3u(language_code) from test_data"
  Time (mean ± σ):     450.4 ms ±  10.1 ms    [User: 17.4 ms, System: 3.9 ms]
  Range (min … max):   442.4 ms … 477.7 ms    10 runs

	SQL	PL/pgSQL	PL/v8	PL/python3u
処理時間平均(ms)	144 .5	285.9	549.0	466.2

結果は上記のようにSQLで定義した関数が最も処理速度が優れているという結果になりました。

続いて、各クエリのプランを見ていきましょう。

testdb=# explain analyze select _normalize_display_station_sqlfunc(language_code) from test_data;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on test_data  (cost=0.00..16925.00 rows=1000000 width=32) (actual time=0.058..96.508 rows=1000000 loops=1)
 Planning Time: 0.924 ms
 Execution Time: 124.541 ms
(3 rows)

testdb=# explain analyze select _normalize_display_station_plpgsql(language_code) from test_data;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..213758.42 rows=1000000 width=32) (actual time=2.815..208.587 rows=1000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_data  (cost=0.00..112758.42 rows=416667 width=32) (actual time=2.577..133.657 rows=333333 loops=3)
 Planning Time: 0.034 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.695 ms, Inlining 0.000 ms, Optimization 0.562 ms, Emission 6.220 ms, Total 7.477 ms
 Execution Time: 267.052 ms
(10 rows)

test=# explain analyze select _normalize_display_station_plv8(language_code) from test_data;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..213758.42 rows=1000000 width=32) (actual time=4.678..475.722 rows=1000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_data  (cost=0.00..112758.42 rows=416667 width=32) (actual time=2.963..388.634 rows=333333 loops=3)
 Planning Time: 0.083 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.533 ms, Inlining 0.000 ms, Optimization 1.031 ms, Emission 6.527 ms, Total 9.091 ms
 Execution Time: 506.516 ms
(10 rows)

testdb=# explain analyze select _normalize_display_station_plpython3u(language_code) from test_data;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..213758.42 rows=1000000 width=32) (actual time=2.532..386.103 rows=1000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test_data  (cost=0.00..112758.42 rows=416667 width=32) (actual time=8.470..305.765 rows=333333 loops=3)
 Planning Time: 0.026 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.758 ms, Inlining 0.000 ms, Optimization 0.479 ms, Emission 6.017 ms, Total 7.254 ms
 Execution Time: 419.853 ms
(10 rows)

これらのクエリプランを確認すると、SQL関数は1CPUを使用して処理、他の関数は2CPUを使用して処理していることが分かります。
加えて、1CPUしか使用していないSQL関数の処理時間が最も短いので、SQL関数が処理コスト的にも優れていると分かります。

まとめ

SQLで簡単なif文のような関数を定義する場合においては、SQL、PL/pgSQL、PL/v8、PL/python3uの4つの選択肢の中では、SQLが処理速度・処理コスト面で優れていると分かりました。
このような場合では、SQL関数を積極的に使うとよさそうです。

PL/v8が速度が出なかったのが個人的には意外でした。for文や配列を含むような関数となるとまた話が変わってくるかもしれません。
今後そのあたりも確認できたらと思います。

この記事を書いた人

羽間大晃
2012年新卒入社
1歳になった長男から仕事部屋を死守する毎日を過ごしています。

FORCIA Tech BlogPublication

――「つなげれば、見える。」――膨大・複雑なデータから必要な情報を的確に探し出す検索テクノロジーを基にしたシステム開発・サービス提供並びに、コンサルティングで企業のDXを推進するフォルシア株式会社テックブログです。エンジニア積極募集中！！

測定準備

測定

結果

まとめ

この記事を書いた人

Discussion