❄️

SQL だけで再帰的に順列/組み合わせを列挙する

2021/01/19に公開

SQL

Snowflake

tech

皆様もテーブルに列に格納されている値から任意の順列 $_n \mathrm{P} _r$ や組み合わせ $_n \mathrm{C} _r$ を列挙したくなった経験があるかと思います。

この記事では、SQL だけで順列/組み合わせを列挙できないか Snowflake を例に検討していきます。

古典的な方法: Self Join

実はこれには SQL っぽい古典的な方法があります。それは Self Join, つまり同じテーブルを 2 つ結合する方法です。

順列の場合はその行と異なる行を、組み合わせの場合は、重複しないようにその行より小さい行を結合すればいいだけです。

まずは、サンプルデータとして 0 から 9 までの数字が格納されたテーブル nums を作成します。

create or replace table nums (i int) as
select seq4() from table(generator(rowcount => 10));
/*
0
1
2
...
7
8
9
*/

まずは、順列の例として $_{10} \mathrm{P} _3$ を考えます。

同じ 10 個の値が格納された列 3 つから、それぞれ 1 つずつ異なる値を取ってくればいいので、クエリは下記のようになります。

select n1.i, n2.i, n3.i
from nums n1
join nums n2 on n1.i != n2.i
join nums n3 on n1.i != n3.i and n2.i != n3.i;
-- 720 rows

正しく 720 行の結果が返ってきていることがわかります。

組み合わせ $_{10} \mathrm{C} _3$ も同様に、同じ 10 個の値が格納された列 3 つから、それぞれ 1 つずつ左列よりも小さい値を取ってくればいいので、クエリは下記のようになります。

select n1.i, n2.i, n3.i
from nums n1
join nums n2 on n1.i > n2.i
join nums n3 on n2.i > n3.i;
-- 120 rows

Self Join による実装の問題点

上記の実装の問題点は、拡張性に乏しいことです。

例えば、これが $_{100} \mathrm{P} _{99}$ とかになると、JOIN 句を 99 個並べることになるし、結合条件も AND が 99 個並ぶことになります。

また、例えばこれを $n$ と $r$ を引数に取るテーブル関数にしたかったりする場合、動的に JOIN 句を増やしたりカラム数を増やしたりするのはつらいので、もう少し柔軟性のある実装である必要があります。

そこでもう少し柔軟な実装方法について検討してみます。

より柔軟な方法: Recursive CTE による DFS

順列/組み合わせの列挙は深さ優先探索 (DFS) で解ける典型的な問題です。

SQL で DFS なんてできるの、という話ですが、SQL にはループはありません^[1]が、再帰はあります。Recursive CTE です。

Recursive CTE はだいたいこんな感じの記述になります。

with cte as (
    select <初期状態>
    from <テーブル>   
    union all
    select <更新処理>
    from <テーブル>, cte
    where <更新条件> and <終了条件>
)
select ... from cte

CTE (WITH 句) の cte が内部で cte 自身を参照しているため、再帰的に処理されて、終了条件にひっかかるまで行 (処理結果) を UNION ALL し続けます。

まずは $_{10} \mathrm{C} _3$ を例に、それぞれ考えていきます。

前述の通り、カラム数を可変にするのは難しいので、配列 (ARRAY 型) を出力として使っていきます。

初期状態

今回の目的は、要素数 3 の組み合わせを表す配列を生成することなので、初期状態は nums の各行を格納した要素数 1 の配列でいいでしょう。結果の配列は combination というカラム名で保持します。

また、今何要素入ってるかを逐一 ARRAY_SIZE() で確認するよりも、変数として持っておいたほうが楽なので、level というカラムで現在の階層を保持します。

select array_construct(i) combination, 1 level
from nums

更新処理/更新条件

更新処理は ARRAY_APPEND() で、追加条件にマッチした i を追加していけばいいでしょう。また、level は単純に 1 大きい値にすれば OK です。

更新条件は、Self Join での実装と同じように、左列よりも小さい値にします。ここでの「左列」は combination 列の配列の最右要素なので、0-index であることを考慮して level-1 で値を取ります。

select array_append(cte.combination, nums.i), cte.level+1
from nums, cte
where combination[cte.level-1] > nums.i

終了条件

終了条件は、更新後の配列長が $r$ 、今回は 3 に到達した場合ですが、1 や 2 でも処理継続されないといけないので、更新後の配列長 level+1 が 3 以下のときのみ、処理されるようにすれば OK です。

cte.level+1 <= 3

クエリを組み立てる

では、すべてを当てはめて組み立てましょう。

with cte as (
    select array_construct(i) combination, 1 level
    from nums    
    union all
    select array_append(cte.combination, i), cte.level+1
    from nums, cte
    where combination[cte.level-1] > nums.i and cte.level+1 <= 3
)

これで Recursive CTE の部分は完成です。

あとは、この CTE から結果を取得するだけですが、今回ほしいのは $_{10} \mathrm{C} _3$ の組み合わせ、つまり要素数 3 の配列のみなので、level = 3 でフィルタしましょう。

with cte as (
    select array_construct(i) combination, 1 level
    from nums    
    union all
    select array_append(cte.combination, nums.i), cte.level+1
    from nums, cte
    where combination[cte.level-1] > nums.i and cte.level+1 <= 3
)
select combination from cte where level = 3;

これで完成です。これを実行した結果は下記のようになり、想定通り 120 行返ってきます。

[ 2, 1, 0 ]
[ 3, 1, 0 ]
[ 3, 2, 0 ]
...
[ 9, 8, 5 ]
[ 9, 8, 6 ]
[ 9, 8, 7 ]

ちなみに、更新条件を ARRAY_CONTAINS() を使って「配列に含まれていない値」に変えれば、順列になります。

with cte as (
    select array_construct(i) permutation, 1 level
    from nums    
    union all
    select array_append(cte.permutation, nums.i), cte.level+1
    from nums, cte
    where array_contains(nums.i::variant, permutation) = false and cte.level+1 <= 3
)
select permutation from cte where level = 3;

[ 0, 1, 2 ]
[ 0, 1, 3 ]
[ 0, 1, 4 ]
...
[ 9, 8, 5 ]
[ 9, 8, 6 ]
[ 9, 8, 7 ]

応用: テーブル関数にしてみる

Recursive CTE による DFS で再実装することによって、 $n$ と $r$ を変数として扱うことができるようになったので、晴れてテーブル関数にすることができるようになりました。

まずは $n$ と $r$ を引数に取って 0 〜 n-1 の数値で $_n \mathrm{C} _r$ と $_n \mathrm{P} _r$ を配列で列挙するデータ生成関数を実装してみましょう。

create or replace function combinations(n int, r int)
returns table(combination array)
language sql
as
$$
with
nums as (
    select seq4() i from table(generator(rowcount => n))
),
cte as (
    select array_construct(i) combination, 1 level
    from nums    
    union all
    select array_append(cte.combination, nums.i), cte.level+1
    from nums, cte
    where combination[cte.level-1] > nums.i and cte.level+1 <= r
)
select combination from cte where level = r
$$
;

元データが生成行数として引数 $n$ を取る GENERATOR() になっていることと、探索段数(停止条件)が引数 $r$ になっている以外は前述のクエリと同じです。

テーブル関数にすることで、任意の $n$ , $r$ について列挙を返す関数として再利用できるようになります。

select * from table(combinations(3, 2));
-- 3 rows
/*
[ 1, 0 ]
[ 2, 0 ]
[ 2, 1 ]
*/

select * from table(combinations(100, 3));
-- 161,700 rows
/*
[ 2, 1, 0 ]
[ 3, 1, 0 ]
[ 3, 2, 0 ]
...
[ 99, 98, 95 ]
[ 99, 98, 96 ]
[ 99, 98, 97 ]
*/

順列も同じようにテーブル関数にできます。

create or replace function permutations(n int, r int)
returns table(permutation array)
language sql
as
$$
with
nums as (
    select seq4() i from table(generator(rowcount => n))
),
cte as (
    select array_construct(i) permutation, 1 level
    from nums    
    union all
    select array_append(cte.permutation, nums.i), cte.level+1
    from nums, cte
    where array_contains(nums.i::variant, permutation) = false and cte.level+1 <= r
)
select permutation from cte where level = r
$$
;

ちゃんと正しく $_{100} \mathrm{P} _3$ が 970,200 行 (100 * 99 * 98) 返ってきていることがわかります。

select * from table(permutations(3, 2));
-- 6 rows
/*
[ 0, 1 ]
[ 0, 2 ]
[ 1, 0 ]
[ 1, 2 ]
[ 2, 0 ]
[ 2, 1 ]
*/

select * from table(permutations(100, 3));
-- 970,200 rows
/*
[ 0, 1, 2 ]
[ 0, 1, 3 ]
[ 0, 1, 4 ]
...

[ 99, 98, 95 ]
[ 99, 98, 96 ]
[ 99, 98, 97 ]
*/

もちろんダミーデータではなく、実際のテーブルの全行から組み合わせを列挙させるような関数にすることもできます。LIMIT を変数にして行数を絞ってもいいですし、フィルタ条件を変数にしてもいいですし、好きなように応用できます。

create or replace table friends (name varchar) as
select * from values ('Mell'), ('Chico'), ('Aro'), ('Marin'), ('Poco'), ('Lou'), ('Nina');

create or replace function friends_combinations(r int)
returns table(combination array)
language sql
as
$$
with
cte as (
    select array_construct(name) combination, 1 level
    from friends    
    union all
    select array_append(cte.combination, friends.name), cte.level+1
    from friends, cte
    where combination[cte.level-1] > friends.name and cte.level+1 <= r
)
select combination from cte where level = r
$$
;

select * from table(friends_combinations(6));
-- 7 rows
/*
[ "Poco", "Mell", "Marin", "Lou", "Chico", "Aro" ]
[ "Poco", "Nina", "Mell", "Marin", "Chico", "Aro" ]
[ "Poco", "Nina", "Mell", "Marin", "Lou", "Chico" ]
[ "Poco", "Nina", "Mell", "Marin", "Lou", "Aro" ]
[ "Poco", "Nina", "Mell", "Lou", "Chico", "Aro" ]
[ "Poco", "Nina", "Marin", "Lou", "Chico", "Aro" ]
[ "Nina", "Mell", "Marin", "Lou", "Chico", "Aro" ]
*/

まとめ

今回は順列/組み合わせの列挙を SQL だけでやってみましたが、上記の通り Recursive CTE は SQL に手続き的な処理を持ち込める強力な構文です。

少しわかりにくい文法ではありますが、この記事のように部分ごとに分解して組み立てていけば、意外とシンプルに記述することができます。

また、更新条件/終了条件を工夫することで、順列/組み合わせだけでなく、データベースの行に対して自由に DFS を行うことができます。

なかなかこれが必要になるケースは少ないかもしれませんが、再帰が必要になったけど書き方がイマイチわからないときの参考になればと思います。

脚注

PL/SQL や Transact-SQL は SQL ではない ↩︎

Discussion

ログインするとコメントできます