ClickHouseでReplicatedDatabaseとReplicatedMergeTreeを併用した際に一部のDDL実行が遅い
はじめに
ClickHouseでReplicatedDatabaseとReplicatedMergeTreeを組み合わせて使用すると、ALTER TABLE操作に異常に時間がかかる問題に遭遇しました。通常は数秒で完了する操作が遅延する問題について調査を行った記録です。なお、本件についてはGitHubにIssueを作成済みです( https://github.com/ClickHouse/ClickHouse/issues/88324 )。
追記 2025-10-10 15:36
記事を書いて少ししてから気がついたのですが、IF NOT EXISTS句を使えば二重に実行されてもエラーにならず、実行に時間がかかることもありませんでした。
ALTER TABLE foo.test01 ADD COLUMN IF NOT EXISTS description String;
前提
- ClickHouseのバージョン: 25.9.3.48
問題
2ノードのクラスターでReplicatedDatabaseとReplicatedMergeTreeを組み合わせた際、ALTER TABLE ADD COLUMNなどを実行すると50秒強かかる。ノード数は少ないし、データもない状態でこれだけ時間がかかるのは不自然に思える。
cluster_1S_2R node 1 :) ALTER TABLE foo.test01 ADD COLUMN description String;
ALTER TABLE foo.test01
(ADD COLUMN `description` String)
Query id: 41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc
┌─shard─┬─replica─┬─status─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ 01 │ 01 │ OK │ 1 │ 1 │
2. │ 02 │ 02 │ OK │ 0 │ 0 │
└───────┴─────────┴────────┴─────────────────────┴──────────────────┘
2 rows in set. Elapsed: 50.701 sec.
なお、ReplicatedDatabaseとMergeTreeの場合は0.033秒で完了するので、やはり何かがおかしい気がする。
cluster_1S_2R node 1 :) ALTER TABLE foo.test02 ADD COLUMN description String;
ALTER TABLE foo.test02
(ADD COLUMN `description` String)
Query id: c37b90f5-dfd8-4a3f-8163-ffab8fa1cfae
┌─shard─┬─replica─┬─status─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ 01 │ 01 │ OK │ 1 │ 1 │
2. │ 02 │ 02 │ OK │ 0 │ 0 │
└───────┴─────────┴────────┴─────────────────────┴──────────────────┘
2 rows in set. Elapsed: 0.033 sec.
環境構築と再現手順
- テスト環境の準備
git clone https://github.com/ClickHouse/examples.git
cd examples/docker-compose-recipes/recipes/cluster_1S_2R
- クラスターを起動
docker compose up -d
- 再現手順
-- ClickHouseクライアントに接続
docker exec -it clickhouse-01 clickhouse-client
-- ReplicatedDatabaseを作成
CREATE DATABASE foo ENGINE = Replicated('/clickhouse/databases/{shard}/foo', '{replica}', '{replica}');
-- ReplicatedMergeTreeテーブルを作成
CREATE TABLE foo.test01 ON CLUSTER cluster_1S_2R
(
id UInt64,
value String
) ENGINE = ReplicatedMergeTree
ORDER BY id;
-- ALTER TABLEを実行(ここで50秒かかる)
ALTER TABLE foo.test01 ADD COLUMN description String;
調査
ログを見ると、ALTER TABLEが10回失敗している痕跡が残っています。
clickhouse@clickhouse-02:/$ grep "ALTER TABLE foo.test01" /var/log/clickhouse-server/clickhouse-server.log | grep "<Error> DDLWorker"
2025.10.10 02:48:15.537666 [ 758 ] {239e8150-e51b-4922-9aa6-2faae33c0cde} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:20.579989 [ 758 ] {27060ac4-65cd-4636-a63c-4cf174616c10} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:25.616248 [ 758 ] {53940d0f-2f6f-429a-a6c2-8456d58bbf03} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:30.651504 [ 758 ] {bc97f935-e3a2-4298-869a-d37eb9a377aa} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:35.693764 [ 758 ] {02a6e168-02df-4905-aaf7-a869ff34be44} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:40.746475 [ 758 ] {a111fd89-c02d-460a-ae8b-4a3298a3d174} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:45.782421 [ 758 ] {d569c9a2-9437-42fd-aba6-91210f60f90c} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:50.818987 [ 758 ] {49b021e8-3a89-4637-8f39-1628d0b35727} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:48:55.865237 [ 758 ] {17a6886b-1513-4729-bf2c-97f70e3847a9} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
2025.10.10 02:49:00.908720 [ 758 ] {4f13bb3d-d83e-4878-ac18-451ce156f9e2} <Error> DDLWorker(foo): Query /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) wasn't finished successfully: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN), Stack trace (when copying this message, always include the lines below):
問題の原因
この問題の根本原因は、2つのレプリケーション機構の競合と思われます。
- ReplicatedDatabase:データベースレベルでDDL操作をレプリケート
- ReplicatedMergeTree:テーブルレベルでデータとメタデータをレプリケート
両方のシステムが同じALTER TABLE操作を適用しようとするため、2番目のレプリカで「カラムが既に存在する」エラーが発生します。
clickhosue-01のログを見ると、クエリ実行=>DDLWorkerによってクエリの複製実行(query-0000000004)=>ReplicatedMergeTreeによるログエントリ作成(log-0000000000)が行われていることがわかります。
2025.10.10 02:48:15.491998 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> executeQuery: (from 172.18.0.1:61884) (query 1, line 1) ALTER TABLE foo.test01 ADD COLUMN description String; (stage: Complete)
2025.10.10 02:48:15.492299 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DatabaseReplicated (foo): Proposing query: ALTER TABLE test01 (ADD COLUMN `description` String)
2025.10.10 02:48:15.501036 [ 764 ] {} <Debug> DDLWorker(foo): Scheduling tasks
2025.10.10 02:48:15.501043 [ 765 ] {} <Debug> DDLWorker(foo): Cleaning queue
2025.10.10 02:48:15.501256 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DDLWorker(foo): Waiting for worker thread to process all entries before query-0000000004 (timeout: 300s)
2025.10.10 02:48:15.501386 [ 764 ] {} <Debug> DDLWorker(foo): Will schedule 1 tasks starting from query-0000000004
2025.10.10 02:48:15.501428 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DDLWorker(foo): Processing task query-0000000004 (query: ALTER TABLE foo.test01 (ADD COLUMN `description` String), backup restore: false)
2025.10.10 02:48:15.503460 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DDLWorker(foo): Executing query: ALTER TABLE foo.test01 (ADD COLUMN `description` String)
2025.10.10 02:48:15.514114 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) (stage: Complete)
2025.10.10 02:48:15.518190 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Created log entry log-0000000000 to update table metadata to version 1
2025.10.10 02:48:15.518211 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Updated shared metadata nodes in ZooKeeper. Waiting for replicas to apply changes.
2025.10.10 02:48:15.518217 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Waiting for 01 to process log entry
2025.10.10 02:48:15.518227 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Waiting for 01 to pull log-0000000000 to queue
2025.10.10 02:48:15.518376 [ 764 ] {} <Debug> DDLWorker(foo): Will not execute task query-0000000004: Entry query-0000000004 has been executed as initial query
2025.10.10 02:48:15.518395 [ 764 ] {} <Debug> DDLWorker(foo): Waiting for queue updates
2025.10.10 02:48:15.519123 [ 121 ] {BgSchPool::7edc3b72-c8b1-4fbb-addb-a53c2b691de9} <Debug> foo.test01 (ReplicatedMergeTreeQueue): Pulling 1 entries to queue: log-0000000000 - log-0000000000
2025.10.10 02:48:15.522785 [ 121 ] {BgSchPool::7edc3b72-c8b1-4fbb-addb-a53c2b691de9} <Debug> foo.test01 (ReplicatedMergeTreeQueue): Pulled 1 entries to queue.
2025.10.10 02:48:15.522996 [ 702 ] {} <Information> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Metadata changed in ZooKeeper. Applying changes locally.
2025.10.10 02:48:15.524472 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Looking for node corresponding to log-0000000000 in 01 queue
2025.10.10 02:48:15.526463 [ 702 ] {} <Information> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Applied changes to the metadata of the table. Current metadata version: 1
2025.10.10 02:48:15.529635 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): No corresponding node found. Assuming it has been already processed. Found 1 nodes
2025.10.10 02:48:15.529840 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DDLWorker(foo): Executed query: /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String)
2025.10.10 02:48:15.529924 [ 86 ] {41abf20b-f2d1-45d9-9e0a-88ca97eb2dcc} <Debug> DDLWorker(foo): Task query-0000000004 executed by current replica
clickhouse-02のログを見ると、ログエントリ反映(この時点でカラム追加済み)=>DDLWorkerによってクエリ実行=>カラムが追加済みなのでエラーになるということがわかります。
2025.10.10 02:48:15.501140 [ 758 ] {} <Debug> DDLWorker(foo): Scheduling tasks
2025.10.10 02:48:15.501191 [ 763 ] {} <Debug> DDLWorker(foo): Cleaning queue
2025.10.10 02:48:15.501452 [ 758 ] {} <Debug> DDLWorker(foo): Will schedule 1 tasks starting from query-0000000004
2025.10.10 02:48:15.518888 [ 758 ] {} <Debug> DDLWorker(foo): Processing task query-0000000004 (query: ALTER TABLE foo.test01 (ADD COLUMN `description` String), backup restore: false)
2025.10.10 02:48:15.520006 [ 758 ] {} <Debug> DDLWorker(foo): Executing query: ALTER TABLE foo.test01 (ADD COLUMN `description` String)
2025.10.10 02:48:15.520169 [ 147 ] {BgSchPool::32219ade-1ab5-4827-bc03-a7a996f28396} <Debug> foo.test01 (ReplicatedMergeTreeQueue): Pulling 1 entries to queue: log-0000000000 - log-0000000000
2025.10.10 02:48:15.526350 [ 147 ] {BgSchPool::32219ade-1ab5-4827-bc03-a7a996f28396} <Debug> foo.test01 (ReplicatedMergeTreeQueue): Pulled 1 entries to queue.
2025.10.10 02:48:15.526552 [ 698 ] {} <Information> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Metadata changed in ZooKeeper. Applying changes locally.
2025.10.10 02:48:15.530900 [ 698 ] {} <Information> foo.test01 (21d0823a-90c5-471f-9200-a719029b7e6d): Applied changes to the metadata of the table. Current metadata version: 1
2025.10.10 02:48:15.536723 [ 758 ] {239e8150-e51b-4922-9aa6-2faae33c0cde} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String) (stage: Complete)
2025.10.10 02:48:15.537449 [ 758 ] {239e8150-e51b-4922-9aa6-2faae33c0cde} <Error> executeQuery: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN) (version 25.9.3.48 (official build)) (from 0.0.0.0:0) (in query: /* ddl_entry=query-0000000004 */ ALTER TABLE foo.test01 (ADD COLUMN `description` String)), Stack trace (when copying this message, always include the lines below):
最終的にはリカバリ処理が走り、完了しているようです。
2025.10.10 02:49:05.913360 [ 758 ] {} <Warning> DDLWorker(foo): Database got stuck at processing task query-0000000004: it failed 10 times in a row with the same error. Will reset digest to mark our replica as lost, and trigger recovery from the most up-to-date metadata from ZooKeeper. See max_retries_before_automatic_recovery setting. The error: Code: 341. DB::Exception: Unexpected error: Code: 15. DB::Exception: Cannot add column `description`: column with this name already exists. (DUPLICATE_COLUMN) (version 25.9.3.48 (official build)). (UNFINISHED), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000011000530
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000b968d98
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000b968868
3. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x000000000b982738
4. DB::DDLWorker::processTask(DB::DDLTaskBase&, std::shared_ptr<zkutil::ZooKeeper> const&, bool) @ 0x0000000015224900
5. DB::DDLWorker::scheduleTasks(bool) @ 0x0000000015220f64
6. DB::DDLWorker::runMainThread() @ 0x0000000015219638
7. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000152373e0
8. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000001112684c
9. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001112d85c
10. ? @ 0x0000000000080398
11. ? @ 0x00000000000e9e9c
(version 25.9.3.48 (official build))
2025.10.10 02:49:05.914471 [ 758 ] {} <Warning> DDLWorker(foo): Resetting digest from 4653930583008424925 to 42
2025.10.10 02:49:05.935170 [ 758 ] {} <Warning> DDLWorker(foo): Replica seems to be lost: our_log_ptr=3, max_log_ptr=4, local_digest=4653930583008424925, zk_digest=42
2025.10.10 02:49:05.935231 [ 758 ] {} <Warning> DatabaseReplicated (foo): Will recover replica with staled log pointer 3 from log pointer 4
2025.10.10 02:49:05.935262 [ 758 ] {} <Debug> DatabaseReplicated (foo): Trying to get consistent metadata snapshot for log pointer 4
2025.10.10 02:49:05.935641 [ 758 ] {} <Debug> DatabaseReplicated (foo): Got consistent metadata snapshot for log pointer 4
2025.10.10 02:49:05.935829 [ 758 ] {} <Debug> DatabaseReplicated (foo): Starting first stage of renaming process. Will rename tables to intermediate names
2025.10.10 02:49:05.935843 [ 758 ] {} <Debug> DatabaseReplicated (foo): Starting second stage of renaming process. Will rename tables from intermediate to desired names
2025.10.10 02:49:05.935850 [ 758 ] {} <Debug> DatabaseReplicated (foo): Renames completed successfully
2025.10.10 02:49:05.935942 [ 758 ] {} <Information> DatabaseReplicated (foo): All tables are created successfully
2025.10.10 02:49:05.939154 [ 758 ] {} <Information> DatabaseReplicated (foo): Failed to marked query-0000000003 as finished (finished=Node exists, synced=Ok). Ignoring.
2025.10.10 02:49:05.942202 [ 758 ] {} <Information> DatabaseReplicated (foo): Marked recovered query-0000000004 as finished
おわりに
一見すると不自然な挙動ですが、ReplicatedDatabaseとReplicatedMergeTreeそれぞれの機構が動作した結果うまくかみ合わずに発生していることがわかりました。最終的にはリカバリ処理が走ってクエリ実行は完了していますが、実運用を考えるとMergeTreeを使う時のような速度で実行が完了してほしいと思います。GitHubリポジトリにIssueを作成したので、意図しない挙動なのか意図通りなのか判明することを期待します。
Discussion