😃

MySQLのJSON配列とMulti Valued Indexes

2024/12/21に公開

はじめに

MySQLにはJSON型があり、JSONを保存することができます。このJSON型では、JSON_CONTAINSや、JSON_OVERLAPSといった複合条件の検索に便利な関数を使うことができます。MySQLには配列型はありませんが、配列型と同じように使うこともできます。

本記事ではこのようなJSON型の利用と、それを高速化するMulti Value Indexについて説明します。

配列検索の例: ブログ記事をタグで検索する

次のようなブログ記事を保存するテーブルを考えます。

CREATE TABLE blog_articles (
    id INT AUTO_INCREMENT PRIMARY KEY,
    author VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    tags JSON NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO blog_articles (author, content, tags) VALUES
('Jack Black', 'Combining Python, JavaScript, and Ruby.', '{"lang":["python","javascript","ruby"],"db":["mysql"]}'),
('Karen White', 'Using MySQL, PostgreSQL, and SQLite together.', '{"lang":["python"],"db":["mysql","postgresql","sqlite"]}'),
('Charlie Davis', 'Introduction to Rust programming.', '{"lang":["rust"],"db":["postgresql"]}'),
('Eve White', 'Using Python with SQLite.', '{"lang":["python"],"db":["sqlite"]}'),
('John Doe', 'This is the first blog post.', '{"lang":["ruby"],"db":["mysql"]}'),
('Jane Smith', 'Exploring the new features in MySQL 8.0.', '{"lang":[],"db":["mysql"]}'),
('Frank Green', 'JavaScript and MongoDB for beginners.', '{"lang":["javascript"],"db":["mongodb"]}'),
('Grace Lee', 'Building web apps with Ruby on Rails.', '{"lang":["ruby"],"db":["postgresql"]}'),
('Hank Miller', 'Django and PostgreSQL integration.', '{"lang":["python"],"db":["postgresql"]}'),
('Ivy Wilson', 'Node.js and MySQL: A perfect match.', '{"lang":["javascript"],"db":["mysql"]}');
mysql> select * from blog_articles;
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author        | content                                       | tags                                                          | created_at          |
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black    | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 00:37:33 |
|  2 | Karen White   | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 00:37:33 |
|  3 | Charlie Davis | Introduction to Rust programming.             | {"db": ["postgresql"], "lang": ["rust"]}                      | 2024-12-21 00:37:33 |
|  4 | Eve White     | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 00:37:33 |
|  5 | John Doe      | This is the first blog post.                  | {"db": ["mysql"], "lang": ["ruby"]}                           | 2024-12-21 00:37:33 |
|  6 | Jane Smith    | Exploring the new features in MySQL 8.0.      | {"db": ["mysql"], "lang": []}                                 | 2024-12-21 00:37:33 |
|  7 | Frank Green   | JavaScript and MongoDB for beginners.         | {"db": ["mongodb"], "lang": ["javascript"]}                   | 2024-12-21 00:37:33 |
|  8 | Grace Lee     | Building web apps with Ruby on Rails.         | {"db": ["postgresql"], "lang": ["ruby"]}                      | 2024-12-21 00:37:33 |
|  9 | Hank Miller   | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 00:37:33 |
| 10 | Ivy Wilson    | Node.js and MySQL: A perfect match.           | {"db": ["mysql"], "lang": ["javascript"]}                     | 2024-12-21 00:37:33 |
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
10 rows in set (0.01 sec)

このとき、タグで絞り込みをかけたくなりますね。これをやってみましょう。

一つのタグで絞り込み

例えば、pythonを含むブログエントリを探すクエリは次のように書けます。

mysql> SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '["python"]', '$.lang');

+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 00:37:33 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 00:37:33 |
|  4 | Eve White   | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 00:37:33 |
|  9 | Hank Miller | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 00:37:33 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
4 rows in set (0.01 sec)

-- または
SELECT * FROM blog_articles WHERE 'python' MEMBER OF(tags->'$.lang');
SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"]}');
-- 結果は同じなので省略

JSON_CONTAINS はJSONを指定して、そのJSONが含まれる時真を返す関数です。これの素晴らしいところは、配列要素に値が含まれていれば真を返すところです。JSONの要素は、JSON PATH形式で指定できます。
単一の値の検索でははMEMBER OFを使うこともできます。

複数のタグで絞り込み(AND)

では、pythonとmysqlに関するblogを探してみましょう。次の様に書けます

mysql> SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"],"db":["mysql"]}');
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 00:37:33 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 00:37:33 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
2 rows in set (0.01 sec)

検索条件をかなり直感的に記述できるのが分かると思います。

複数のタグで絞り込み(OR)

では、"python または javascript"に関するblogを抽出するにはどうすれば良いでしょうか? ORで条件をつなぐこともできますが、JSON_OVERLAPSという便利な関数もあります。これは配列要素に対して、文字通り重複部分があるかどうかを判定します。

tags->'$.lang'JSON_EXTRACT(tags,'$.lang')の省略記法で、要素を抽出するものです。ここでは言語の配列を取得するのに利用しています。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags->'$.lang', '["python","javascript"]');
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 00:37:33 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 00:37:33 |
|  4 | Eve White   | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 00:37:33 |
|  7 | Frank Green | JavaScript and MongoDB for beginners.         | {"db": ["mongodb"], "lang": ["javascript"]}                   | 2024-12-21 00:37:33 |
|  9 | Hank Miller | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 00:37:33 |
| 10 | Ivy Wilson  | Node.js and MySQL: A perfect match.           | {"db": ["mysql"], "lang": ["javascript"]}                     | 2024-12-21 00:37:33 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
6 rows in set (0.01 sec)

JSON_OVERLAPSの注意として、JSONオブジェクトに対して適用した場合は、共通のkey,valueのペアがあるかを調べるということです。valueは完全に一致しないといけません。先程の例でJSONオブジェクトで指定してみます。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags, '{"lang":["javascript","python"]}');
Empty set (0.00 sec)

このようになります。

同様に、"mysql"もしくは"javascript"に言及したblogを探すのにOVERLAPSを使うと期待と異なることになります。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags, '{"lang":["javascript"],"db":["mongodb"]}');
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
| id | author      | content                               | tags                                        | created_at          |
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
|  7 | Frank Green | JavaScript and MongoDB for beginners. | {"db": ["mongodb"], "lang": ["javascript"]} | 2024-12-21 00:37:33 |
| 10 | Ivy Wilson  | Node.js and MySQL: A perfect match.   | {"db": ["mysql"], "lang": ["javascript"]}   | 2024-12-21 00:37:33 |
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
2 rows in set (0.00 sec)

一見よさそうですが、{"db": ["mysql"], "lang": ["python", "javascript", "ruby"]} のようにlangに複数要素があるものがヒットしません。

OVELAPSは配列に対して利用するのが良さそうですね。

Multi Value Indexes

さて、このような検索の効率はどうなのでしょうか。普通に考えるとFull Scanとなります。

mysql> explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags->'$.lang','["python"]');
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table         | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | blog_articles | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |   100.00 | Using where |
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

予想通りです。

MySQLにはこのような配列に利用できる、Multi Value Indexesというインデックスがあります。

https://dev.mysql.com/doc/refman/8.0/ja/create-index.html#create-index-multi-valued

早速作ってみます。

mysql> CREATE INDEX idx_lang ON blog_articles((CAST(tags->'$.lang' AS char(20) ARRAY)));
Query OK, 0 rows affected (0.08 sec)
Records: 0  Duplicates: 0  Warnings: 0

先に出てきたいくつかの例を試してみましょう。

mysql> explain SELECT * FROM blog_articles WHERE 'python' MEMBER OF(tags->'$.lang');
+----+-------------+---------------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table         | partitions | type | possible_keys | key      | key_len | ref   | rows | filtered | Extra       |
+----+-------------+---------------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | blog_articles | NULL       | ref  | idx_lang      | idx_lang | 83      | const |    4 |   100.00 | Using where |
+----+-------------+---------------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

-- 同じ
explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags->'$.lang','["python"]');

-- OVERLAPSもインデックスを利用する
mysql> explain SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags->'$.lang','["python","javascript"]');
+----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table         | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | blog_articles | NULL       | range | idx_lang      | idx_lang | 83      | NULL |    7 |   100.00 | Using where |
+----+-------------+---------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

-- オブジェクトに対しては利用されない。残念。
mysql> explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"],"db":["mysql"]}');
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table         | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | blog_articles | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |   100.00 | Using where |
+----+-------------+---------------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.02 sec)

--

このように、JSON関数に対してindexが機能するのがわかります。特にJSON型に大きなオブジェクトを保管しているときなどは、Full Table Scanとの差は大きくなるでしょう。

おわりに

JSON型/JSON配列を使う例について紹介しました。配列のような可変長項目や、複数選択がある選択肢による抽出などに効果がありそうです。

Discussion