Snowflake ウェアハウス Generation 2によるTPCH性能検証
はじめ
2025年5月、SnowflakeはウェアハウスでGeneration2のインスタンス提供を開始しました。(リンク) ユーザー側で特別な設定変更やコード修正は不要で、既存のワークロードをそのまま移行できる点が最大の特長です。本記事では、皆さんの環境でもすぐに再現可能な形でベンチマーク手法を紹介し、実際に得られた性能向上および環境負荷低減の成果をご覧いただきます。
Generation2 ウェアハウスとは
Generation 2 Standard ウェアハウス(Gen2)は、Snowflakeの標準仮想ウェアハウスの次世代版で、より高速なハードウェアとDELETE/UPDATE/MERGEやテーブルスキャンのソフトウェア最適化を組み合わせることで、従来より多くのクエリを短時間で完了でき、高い同時実行性を実現します。
対応リージョン
2025年5月現在、対応リージョンはAWS(us-west-2、eu-central-1)およびAzure(East US 2、West Europe)で、GCPでは未サポートです。
価格
通常のウェアハウスと比較し、Gen2の値段はAWSで1.35倍、Azureで1.25倍となっている点に注意した方が良いです。詳細はこちらのTable 1(b)を参照すること。
作成方法
以下のコマンドのようにRESOURCE_CONSTRAINT
でGEN2を指定して頂くことで作成が可能です。また、ALTER WAREHOUSEのコマンドを利用し、既存ウェアハウスから変更することも可能です。(リンク)
CREATE OR REPLACE WAREHOUSE next_generation_default_size
RESOURCE_CONSTRAINT = STANDARD_GEN_2;
結果
前提条件
性能検証を実施する上での前提条件を下記に載せておきます。
- Gen1とGen2のウェアハウス(compute_wh_gen1, compute_wh_gen2)を作成しています。
- 両者のウェアハウスのサイズはSサイズとしております。
- 各クエリに対する試行回数5回実行した上での平均値を取っております。
- 使用したスキーマはtpch_sf1000です。
結果
以下に、各クエリとGen2適用と非適用時のパフォーマンス結果を図示したグラフを記載します。
また、上記のグラフが参照する結果を表形式で記載しておきます。Query番号21で18%と少し改善率が低いものの、その他のクエリではGen2の方がGen1と比較し、30-40% 程度の速度向上していることが観察されています。
TPCH Query | Gen1 (s) | Gen2 (s) | (Gen 2 / Gen1 - 1)(%) |
---|---|---|---|
1 | 59.30 | 38.36 | -35.3 |
2 | 12.59 | 8.58 | -31.9 |
3 | 47.08 | 28.12 | -40.3 |
4 | 31.90 | 17.70 | -44.5 |
5 | 72.39 | 50.19 | -30.7 |
6 | 3.95 | 2.77 | -29.9 |
7 | 39.68 | 16.52 | -58.4 |
8 | 51.19 | 27.04 | -47.2 |
9 | 121.38 | 81.87 | -32.6 |
10 | 78.53 | 47.37 | -39.7 |
11 | 5.59 | 3.76 | -32.7 |
12 | 15.09 | 9.80 | -35.1 |
13 | 96.92 | 77.11 | -20.4 |
14 | 5.56 | 3.78 | -32.0 |
15 | 10.47 | 7.91 | -24.5 |
16 | 23.05 | 11.79 | -51.5 |
17 | 43.72 | 21.22 | -51.5 |
18 | 207.34 | 140.00 | -32.5 |
19 | 36.06 | 16.46 | -54.4 |
20 | 18.27 | 11.99 | -34.4 |
21 | 107.43 | 88.81 | -17.3 |
22 | 13.19 | 8.05 | -39.0 |
クエリ
こちらには、性能検証に使用したSQLクエリとクエリを実行するためのPythonコードを載せておきます。
性能検証に使用したSQL群
-- TPCH_Query1
SELECT
l_returnflag,
l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM lineitem
WHERE l_shipdate <= DATEADD(day, 90, '1998-12-01')
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
-- TPCH_Query2
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM part,
supplier,
partsupp,
nation,
region
WHERE p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (
SELECT MIN(ps_supplycost)
FROM partsupp,
supplier,
nation,
region
WHERE p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
)
ORDER BY s_acctbal DESC, n_name, s_name, p_partkey
LIMIT 100;
-- TPCH_Query3
SELECT
l_orderkey,
SUM(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM customer,
orders,
lineitem
WHERE c_mktsegment = 'AUTOMOBILE'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < DATE '1995-03-15'
AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey, o_orderdate, o_shippriority
ORDER BY revenue DESC, o_orderdate
LIMIT 10;
-- TPCH_Query4
SELECT
o_orderpriority,
COUNT(*) AS order_count
FROM orders
WHERE o_orderdate >= DATE '1993-07-01'
AND o_orderdate < DATE '1993-10-01'
AND EXISTS (
SELECT *
FROM lineitem
WHERE l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate
)
GROUP BY o_orderpriority
ORDER BY o_orderpriority;
-- TPCH_Query5
SELECT
n_name,
SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM customer,
orders,
lineitem,
supplier,
nation,
region
WHERE c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND l_suppkey = s_suppkey
AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'ASIA'
AND o_orderdate >= DATE '1994-01-01'
AND o_orderdate < DATE '1995-01-01'
GROUP BY n_name
ORDER BY revenue DESC;
-- TPCH_Query6
SELECT
SUM(l_extendedprice * l_discount) AS revenue
FROM lineitem
WHERE l_shipdate >= DATE '1994-01-01'
AND l_shipdate < DATE '1995-01-01'
AND l_discount BETWEEN 0.06 AND 0.08
AND l_quantity < 24;
-- TPCH_Query7
SELECT
supp_nation,
cust_nation,
l_year,
SUM(volume) AS revenue
FROM (
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
EXTRACT(year FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND (
(n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR
(n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
)
AND l_shipdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
) AS shipping
GROUP BY supp_nation, cust_nation, l_year
ORDER BY supp_nation, cust_nation, l_year;
-- TPCH_Query8
SELECT
o_year,
SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share
FROM (
SELECT
EXTRACT(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) AS volume,
n2.n_name AS nation
FROM part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
WHERE p_partkey = l_partkey
AND s_suppkey = l_suppkey
AND l_orderkey = o_orderkey
AND o_custkey = c_custkey
AND c_nationkey = n1.n_nationkey
AND n1.n_regionkey = r_regionkey
AND r_name = 'AMERICA'
AND s_nationkey = n2.n_nationkey
AND o_orderdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
AND p_type = 'ECONOMY ANODIZED STEEL'
) AS all_nations
GROUP BY o_year
ORDER BY o_year;
-- TPCH_Query9
SELECT
nation,
o_year,
SUM(amount) AS sum_profit
FROM (
SELECT
n_name AS nation,
EXTRACT(year FROM o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%'
) AS profit
GROUP BY nation, o_year
ORDER BY nation, o_year DESC;
-- TPCH_Query10
SELECT
c_custkey,
c_name,
SUM(l_extendedprice * (1 - l_discount)) AS revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
FROM customer,
orders,
lineitem,
nation
WHERE c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate >= DATE '1993-10-01'
AND o_orderdate < DATE '1994-01-01'
AND l_returnflag = 'R'
AND c_nationkey = n_nationkey
GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
ORDER BY revenue DESC
LIMIT 20;
-- TPCH_Query11
SELECT
ps_partkey,
SUM(ps_supplycost * ps_availqty) AS value
FROM partsupp,
supplier,
nation
WHERE ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
GROUP BY ps_partkey
HAVING SUM(ps_supplycost * ps_availqty) > (
SELECT SUM(ps_supplycost * ps_availqty) * 0.0001
FROM partsupp,
supplier,
nation
WHERE ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'GERMANY'
)
ORDER BY value DESC;
-- TPCH_Query12
SELECT
l_shipmode,
SUM(CASE WHEN o_orderpriority IN ('1-URGENT','2-HIGH') THEN 1 ELSE 0 END) AS high_line_count,
SUM(CASE WHEN o_orderpriority NOT IN ('1-URGENT','2-HIGH') THEN 1 ELSE 0 END) AS low_line_count
FROM orders,
lineitem
WHERE o_orderkey = l_orderkey
AND l_shipmode IN ('MAIL','SHIP')
AND l_commitdate < l_receiptdate
AND l_shipdate < l_commitdate
AND l_receiptdate >= DATE '1994-01-01'
AND l_receiptdate < DATE '1995-01-01'
GROUP BY l_shipmode
ORDER BY l_shipmode;
-- TPCH_Query13
SELECT
c_count,
COUNT(*) AS custdist
FROM (
SELECT
c_custkey,
COUNT(o_orderkey) AS c_count
FROM customer
LEFT JOIN orders
ON c_custkey = o_custkey
AND o_comment NOT LIKE '%pending%packages%'
GROUP BY c_custkey
) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;
-- TPCH_Query14
SELECT
100.0 * SUM(CASE WHEN p_type LIKE 'PROMO%' THEN l_extendedprice * (1 - l_discount) ELSE 0 END)
/ SUM(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM lineitem,
part
WHERE l_partkey = p_partkey
AND l_shipdate >= DATE '1995-09-01'
AND l_shipdate < DATE '1995-10-01';
-- TPCH_Query15
WITH revenue AS (
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM lineitem
WHERE l_shipdate >= DATE '1996-01-01'
AND l_shipdate < DATE '1996-04-01'
GROUP BY l_suppkey
)
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM supplier,
revenue
WHERE s_suppkey = supplier_no
AND total_revenue = (SELECT MAX(total_revenue) FROM revenue)
ORDER BY s_suppkey;
-- TPCH_Query16
SELECT
p_brand,
p_type,
p_size,
COUNT(DISTINCT ps_suppkey) AS supplier_cnt
FROM partsupp,
part
WHERE p_partkey = ps_partkey
AND p_brand <> 'Brand#45'
AND p_type NOT LIKE 'MEDIUM POLISHED%'
AND p_size IN (49,14,23,45,19,3,36,9)
AND ps_suppkey NOT IN (
SELECT s_suppkey
FROM supplier
WHERE s_comment LIKE '%Customer%Complaints%'
)
GROUP BY p_brand, p_type, p_size
ORDER BY supplier_cnt DESC, p_brand, p_type, p_size;
-- TPCH_Query17
SELECT
SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM lineitem,
part
WHERE p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container = 'MED BOX'
AND l_quantity < (
SELECT 0.2 * AVG(l_quantity)
FROM lineitem
WHERE l_partkey = p_partkey
);
-- TPCH_Query18
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
SUM(l_quantity)
FROM customer,
orders,
lineitem
WHERE o_orderkey IN (
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING SUM(l_quantity) > 300
)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
ORDER BY o_totalprice DESC, o_orderdate
LIMIT 100;
-- TPCH_Query19
SELECT
SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM lineitem,
part
WHERE (
p_partkey = l_partkey
AND p_brand = 'Brand#12'
AND p_container IN ('SM CASE','SM BOX','SM PACK','SM PKG')
AND l_quantity BETWEEN 1 AND 11
AND p_size BETWEEN 1 AND 5
AND l_shipmode IN ('AIR','AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON'
)
OR (
p_partkey = l_partkey
AND p_brand = 'Brand#23'
AND p_container IN ('MED BAG','MED BOX','MED PKG','MED PACK')
AND l_quantity BETWEEN 10 AND 20
AND p_size BETWEEN 1 AND 10
AND l_shipmode IN ('AIR','AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON'
)
OR (
p_partkey = l_partkey
AND p_brand = 'Brand#34'
AND p_container IN ('LG CASE','LG BOX','LG PACK','LG PKG')
AND l_quantity BETWEEN 20 AND 30
AND p_size BETWEEN 1 AND 15
AND l_shipmode IN ('AIR','AIR REG')
AND l_shipinstruct = 'DELIVER IN PERSON'
);
-- TPCH_Query20
SELECT
s_name,
s_address
FROM supplier,
nation
WHERE s_suppkey IN (
SELECT ps_suppkey
FROM partsupp
WHERE ps_partkey IN (
SELECT p_partkey
FROM part
WHERE p_name LIKE 'forest%'
)
AND ps_availqty > (
SELECT 0.5 * SUM(l_quantity)
FROM lineitem
WHERE l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= DATE '1994-01-01'
AND l_shipdate < DATE '1995-01-01'
)
)
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY s_name;
-- TPCH_Query21
SELECT
s_name,
COUNT(*) AS numwait
FROM supplier,
lineitem l1,
orders,
nation
WHERE s_suppkey = l1.l_suppkey
AND o_orderkey = l1.l_orderkey
AND o_orderstatus = 'F'
AND l1.l_receiptdate > l1.l_commitdate
AND EXISTS (
SELECT *
FROM lineitem l2
WHERE l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey
)
AND NOT EXISTS (
SELECT *
FROM lineitem l3
WHERE l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate
)
AND s_nationkey = n_nationkey
AND n_name = 'SAUDI ARABIA'
GROUP BY s_name
ORDER BY numwait DESC, s_name
LIMIT 100;
-- TPCH_Query22
SELECT
cntrycode,
COUNT(*) AS numcust,
SUM(c_acctbal) AS totacctbal
FROM (
SELECT
SUBSTRING(c_phone, 1, 2) AS cntrycode,
c_acctbal
FROM customer
WHERE SUBSTRING(c_phone, 1, 2) IN ('13','31','23','29','30','18','17')
AND c_acctbal > (
SELECT AVG(c_acctbal)
FROM customer
WHERE c_acctbal > 0.00
AND SUBSTRING(c_phone, 1, 2) IN ('13','31','23','29','30','18','17')
)
AND NOT EXISTS (
SELECT *
FROM orders
WHERE o_custkey = c_custkey
)
) AS custsale
GROUP BY cntrycode
ORDER BY cntrycode;
Pythonコード
import re
import time
import snowflake.connector as sf
conn = sf.connect(
account = "<your_account>",
user = "<your_username>",
role = "<your_role>",
database = "snowflake_sample_data",
schema = "tpch_sf1000",
warehouse = "compute_wh_gen2", ## "compute_wh_gen1"
private_key = "<your_private_key>"
)
print("Connection OK:", conn)
NUM_OF_SAMPLES = 5
LABEL_PATTERN = re.compile(r"\s*TPCH\s*Query\s*\d+", re.IGNORECASE)
def benchmark(is_gen2: bool):
print(f"Benchmarking {'with' if is_gen2 else 'without'} GEN2")
result_path = "results_with_gen2.txt" if is_gen2 else "results_without_gen2.txt"
with conn.cursor() as cs:
cs.execute("ALTER SESSION SET USE_CACHED_RESULT=FALSE")
with open(result_path, "w", encoding="utf-8") as result_file, \
open("tpch.sql", encoding="utf-8") as f:
all_queries = f.read()
results = "Test\tTime\n"
for _ in range(NUM_OF_SAMPLES):
# セミコロンで分割し、空要素はスキップ
for raw in filter(None, (q.strip() for q in all_queries.split(";"))):
# ラベルを安全に抽出(無ければスキップ)
match = LABEL_PATTERN.search(raw)
if not match:
continue
label = match.group(0) # ex) "TPCH_Query1"
# 実行対象 SQL 本体(コメントより後ろ)
sql = raw.split("\n", 1)[-1].strip()
start_ts = time.time()
with conn.cursor() as cs: # 自動クローズ
cs.execute(sql)
cs.fetchall()
delta = time.time() - start_ts
print(f"{label}: {delta:3.5f}s")
results += f"{label}\t{delta:3.5f}\n"
result_file.write(results)
if __name__ == '__main__':
# Running benchmarks without gen2
# benchmark(False)
# Running benchmarks with gen2
benchmark(True)
参考文献
snowflake-tpch-timingを参考に、コードを作成しました。
Discussion