🎉

Snowflake ウェアハウス Generation 2によるTPCH性能検証

に公開

はじめ

2025年5月、SnowflakeはウェアハウスでGeneration2のインスタンス提供を開始しました。(リンク) ユーザー側で特別な設定変更やコード修正は不要で、既存のワークロードをそのまま移行できる点が最大の特長です。本記事では、皆さんの環境でもすぐに再現可能な形でベンチマーク手法を紹介し、実際に得られた性能向上および環境負荷低減の成果をご覧いただきます。

Generation2 ウェアハウスとは

Generation 2 Standard ウェアハウス(Gen2)は、Snowflakeの標準仮想ウェアハウスの次世代版で、より高速なハードウェアとDELETE/UPDATE/MERGEやテーブルスキャンのソフトウェア最適化を組み合わせることで、従来より多くのクエリを短時間で完了でき、高い同時実行性を実現します。

対応リージョン

2025年5月現在、対応リージョンはAWS(us-west-2、eu-central-1)およびAzure(East US 2、West Europe)で、GCPでは未サポートです。

価格

通常のウェアハウスと比較し、Gen2の値段はAWSで1.35倍、Azureで1.25倍となっている点に注意した方が良いです。詳細はこちらのTable 1(b)を参照すること。

作成方法

以下のコマンドのようにRESOURCE_CONSTRAINTでGEN2を指定して頂くことで作成が可能です。また、ALTER WAREHOUSEのコマンドを利用し、既存ウェアハウスから変更することも可能です。(リンク)

CREATE OR REPLACE WAREHOUSE next_generation_default_size
  RESOURCE_CONSTRAINT = STANDARD_GEN_2;

結果

前提条件

性能検証を実施する上での前提条件を下記に載せておきます。

  1. Gen1とGen2のウェアハウス(compute_wh_gen1, compute_wh_gen2)を作成しています。
  2. 両者のウェアハウスのサイズはSサイズとしております。
  3. 各クエリに対する試行回数5回実行した上での平均値を取っております。
  4. 使用したスキーマはtpch_sf1000です。

結果

以下に、各クエリとGen2適用と非適用時のパフォーマンス結果を図示したグラフを記載します。
Graphs

また、上記のグラフが参照する結果を表形式で記載しておきます。Query番号21で18%と少し改善率が低いものの、その他のクエリではGen2の方がGen1と比較し、30-40% 程度の速度向上していることが観察されています。

TPCH Query Gen1 (s) Gen2 (s) (Gen 2 / Gen1 - 1)(%)
1 59.30 38.36 -35.3
2 12.59 8.58 -31.9
3 47.08 28.12 -40.3
4 31.90 17.70 -44.5
5 72.39 50.19 -30.7
6 3.95 2.77 -29.9
7 39.68 16.52 -58.4
8 51.19 27.04 -47.2
9 121.38 81.87 -32.6
10 78.53 47.37 -39.7
11 5.59 3.76 -32.7
12 15.09 9.80 -35.1
13 96.92 77.11 -20.4
14 5.56 3.78 -32.0
15 10.47 7.91 -24.5
16 23.05 11.79 -51.5
17 43.72 21.22 -51.5
18 207.34 140.00 -32.5
19 36.06 16.46 -54.4
20 18.27 11.99 -34.4
21 107.43 88.81 -17.3
22 13.19 8.05 -39.0

クエリ

こちらには、性能検証に使用したSQLクエリとクエリを実行するためのPythonコードを載せておきます。

性能検証に使用したSQL群
-- TPCH_Query1
SELECT
    l_returnflag,
    l_linestatus,
    SUM(l_quantity)                               AS sum_qty,
    SUM(l_extendedprice)                          AS sum_base_price,
    SUM(l_extendedprice * (1 - l_discount))       AS sum_disc_price,
    SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
    AVG(l_quantity)                               AS avg_qty,
    AVG(l_extendedprice)                          AS avg_price,
    AVG(l_discount)                               AS avg_disc,
    COUNT(*)                                      AS count_order
FROM lineitem
WHERE l_shipdate <= DATEADD(day, 90, '1998-12-01')
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;

-- TPCH_Query2
SELECT
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
FROM part,
     supplier,
     partsupp,
     nation,
     region
WHERE p_partkey = ps_partkey
  AND s_suppkey = ps_suppkey
  AND p_size = 15
  AND p_type LIKE '%BRASS'
  AND s_nationkey = n_nationkey
  AND n_regionkey = r_regionkey
  AND r_name = 'EUROPE'
  AND ps_supplycost = (
        SELECT MIN(ps_supplycost)
        FROM partsupp,
             supplier,
             nation,
             region
        WHERE p_partkey = ps_partkey
          AND s_suppkey = ps_suppkey
          AND s_nationkey = n_nationkey
          AND n_regionkey = r_regionkey
          AND r_name = 'EUROPE'
    )
ORDER BY s_acctbal DESC, n_name, s_name, p_partkey
LIMIT 100;

-- TPCH_Query3
SELECT
    l_orderkey,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    o_orderdate,
    o_shippriority
FROM customer,
     orders,
     lineitem
WHERE c_mktsegment = 'AUTOMOBILE'
  AND c_custkey    = o_custkey
  AND l_orderkey   = o_orderkey
  AND o_orderdate  < DATE '1995-03-15'
  AND l_shipdate   > DATE '1995-03-15'
GROUP BY l_orderkey, o_orderdate, o_shippriority
ORDER BY revenue DESC, o_orderdate
LIMIT 10;

-- TPCH_Query4
SELECT
    o_orderpriority,
    COUNT(*) AS order_count
FROM orders
WHERE o_orderdate >= DATE '1993-07-01'
  AND o_orderdate < DATE '1993-10-01'
  AND EXISTS (
        SELECT *
        FROM lineitem
        WHERE l_orderkey = o_orderkey
          AND l_commitdate < l_receiptdate
    )
GROUP BY o_orderpriority
ORDER BY o_orderpriority;

-- TPCH_Query5
SELECT
    n_name,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM customer,
     orders,
     lineitem,
     supplier,
     nation,
     region
WHERE c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND l_suppkey  = s_suppkey
  AND c_nationkey = s_nationkey
  AND s_nationkey = n_nationkey
  AND n_regionkey = r_regionkey
  AND r_name = 'ASIA'
  AND o_orderdate >= DATE '1994-01-01'
  AND o_orderdate < DATE '1995-01-01'
GROUP BY n_name
ORDER BY revenue DESC;

-- TPCH_Query6
SELECT
    SUM(l_extendedprice * l_discount) AS revenue
FROM lineitem
WHERE l_shipdate >= DATE '1994-01-01'
  AND l_shipdate <  DATE '1995-01-01'
  AND l_discount BETWEEN 0.06 AND 0.08
  AND l_quantity < 24;

-- TPCH_Query7
SELECT
    supp_nation,
    cust_nation,
    l_year,
    SUM(volume) AS revenue
FROM (
    SELECT
        n1.n_name                      AS supp_nation,
        n2.n_name                      AS cust_nation,
        EXTRACT(year FROM l_shipdate)  AS l_year,
        l_extendedprice * (1 - l_discount) AS volume
    FROM supplier,
         lineitem,
         orders,
         customer,
         nation n1,
         nation n2
    WHERE s_suppkey = l_suppkey
      AND o_orderkey = l_orderkey
      AND c_custkey  = o_custkey
      AND s_nationkey = n1.n_nationkey
      AND c_nationkey = n2.n_nationkey
      AND (
            (n1.n_name = 'FRANCE'  AND n2.n_name = 'GERMANY') OR
            (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
          )
      AND l_shipdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
) AS shipping
GROUP BY supp_nation, cust_nation, l_year
ORDER BY supp_nation, cust_nation, l_year;

-- TPCH_Query8
SELECT
    o_year,
    SUM(CASE WHEN nation = 'BRAZIL' THEN volume ELSE 0 END) / SUM(volume) AS mkt_share
FROM (
    SELECT
        EXTRACT(year FROM o_orderdate) AS o_year,
        l_extendedprice * (1 - l_discount) AS volume,
        n2.n_name AS nation
    FROM part,
         supplier,
         lineitem,
         orders,
         customer,
         nation n1,
         nation n2,
         region
    WHERE p_partkey = l_partkey
      AND s_suppkey = l_suppkey
      AND l_orderkey = o_orderkey
      AND o_custkey  = c_custkey
      AND c_nationkey = n1.n_nationkey
      AND n1.n_regionkey = r_regionkey
      AND r_name = 'AMERICA'
      AND s_nationkey = n2.n_nationkey
      AND o_orderdate BETWEEN DATE '1995-01-01' AND DATE '1996-12-31'
      AND p_type = 'ECONOMY ANODIZED STEEL'
) AS all_nations
GROUP BY o_year
ORDER BY o_year;

-- TPCH_Query9
SELECT
    nation,
    o_year,
    SUM(amount) AS sum_profit
FROM (
    SELECT
        n_name AS nation,
        EXTRACT(year FROM o_orderdate) AS o_year,
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
    FROM part,
         supplier,
         lineitem,
         partsupp,
         orders,
         nation
    WHERE s_suppkey = l_suppkey
      AND ps_suppkey = l_suppkey
      AND ps_partkey = l_partkey
      AND p_partkey  = l_partkey
      AND o_orderkey = l_orderkey
      AND s_nationkey = n_nationkey
      AND p_name LIKE '%green%'
) AS profit
GROUP BY nation, o_year
ORDER BY nation, o_year DESC;

-- TPCH_Query10
SELECT
    c_custkey,
    c_name,
    SUM(l_extendedprice * (1 - l_discount)) AS revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
FROM customer,
     orders,
     lineitem,
     nation
WHERE c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND o_orderdate >= DATE '1993-10-01'
  AND o_orderdate <  DATE '1994-01-01'
  AND l_returnflag = 'R'
  AND c_nationkey = n_nationkey
GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
ORDER BY revenue DESC
LIMIT 20;

-- TPCH_Query11
SELECT
    ps_partkey,
    SUM(ps_supplycost * ps_availqty) AS value
FROM partsupp,
     supplier,
     nation
WHERE ps_suppkey = s_suppkey
  AND s_nationkey = n_nationkey
  AND n_name = 'GERMANY'
GROUP BY ps_partkey
HAVING SUM(ps_supplycost * ps_availqty) > (
    SELECT SUM(ps_supplycost * ps_availqty) * 0.0001
    FROM partsupp,
         supplier,
         nation
    WHERE ps_suppkey = s_suppkey
      AND s_nationkey = n_nationkey
      AND n_name = 'GERMANY'
)
ORDER BY value DESC;

-- TPCH_Query12
SELECT
    l_shipmode,
    SUM(CASE WHEN o_orderpriority IN ('1-URGENT','2-HIGH') THEN 1 ELSE 0 END) AS high_line_count,
    SUM(CASE WHEN o_orderpriority NOT IN ('1-URGENT','2-HIGH') THEN 1 ELSE 0 END) AS low_line_count
FROM orders,
     lineitem
WHERE o_orderkey = l_orderkey
  AND l_shipmode IN ('MAIL','SHIP')
  AND l_commitdate < l_receiptdate
  AND l_shipdate  < l_commitdate
  AND l_receiptdate >= DATE '1994-01-01'
  AND l_receiptdate <  DATE '1995-01-01'
GROUP BY l_shipmode
ORDER BY l_shipmode;

-- TPCH_Query13
SELECT
    c_count,
    COUNT(*) AS custdist
FROM (
    SELECT
        c_custkey,
        COUNT(o_orderkey) AS c_count
    FROM customer
    LEFT JOIN orders
      ON c_custkey = o_custkey
      AND o_comment NOT LIKE '%pending%packages%'
    GROUP BY c_custkey
) AS c_orders
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

-- TPCH_Query14
SELECT
    100.0 * SUM(CASE WHEN p_type LIKE 'PROMO%' THEN l_extendedprice * (1 - l_discount) ELSE 0 END)
          / SUM(l_extendedprice * (1 - l_discount)) AS promo_revenue
FROM lineitem,
     part
WHERE l_partkey = p_partkey
  AND l_shipdate >= DATE '1995-09-01'
  AND l_shipdate <  DATE '1995-10-01';

-- TPCH_Query15
WITH revenue AS (
    SELECT
        l_suppkey AS supplier_no,
        SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
    FROM lineitem
    WHERE l_shipdate >= DATE '1996-01-01'
      AND l_shipdate <  DATE '1996-04-01'
    GROUP BY l_suppkey
)
SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM supplier,
     revenue
WHERE s_suppkey = supplier_no
  AND total_revenue = (SELECT MAX(total_revenue) FROM revenue)
ORDER BY s_suppkey;

-- TPCH_Query16
SELECT
    p_brand,
    p_type,
    p_size,
    COUNT(DISTINCT ps_suppkey) AS supplier_cnt
FROM partsupp,
     part
WHERE p_partkey = ps_partkey
  AND p_brand <> 'Brand#45'
  AND p_type NOT LIKE 'MEDIUM POLISHED%'
  AND p_size IN (49,14,23,45,19,3,36,9)
  AND ps_suppkey NOT IN (
        SELECT s_suppkey
        FROM supplier
        WHERE s_comment LIKE '%Customer%Complaints%'
    )
GROUP BY p_brand, p_type, p_size
ORDER BY supplier_cnt DESC, p_brand, p_type, p_size;

-- TPCH_Query17
SELECT
    SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM lineitem,
     part
WHERE p_partkey = l_partkey
  AND p_brand = 'Brand#23'
  AND p_container = 'MED BOX'
  AND l_quantity < (
        SELECT 0.2 * AVG(l_quantity)
        FROM lineitem
        WHERE l_partkey = p_partkey
    );

-- TPCH_Query18
SELECT
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    SUM(l_quantity)
FROM customer,
     orders,
     lineitem
WHERE o_orderkey IN (
        SELECT l_orderkey
        FROM lineitem
        GROUP BY l_orderkey
        HAVING SUM(l_quantity) > 300
    )
  AND c_custkey = o_custkey
  AND o_orderkey = l_orderkey
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
ORDER BY o_totalprice DESC, o_orderdate
LIMIT 100;

-- TPCH_Query19
SELECT
    SUM(l_extendedprice * (1 - l_discount)) AS revenue
FROM lineitem,
     part
WHERE (
        p_partkey = l_partkey
        AND p_brand = 'Brand#12'
        AND p_container IN ('SM CASE','SM BOX','SM PACK','SM PKG')
        AND l_quantity BETWEEN 1 AND 11
        AND p_size BETWEEN 1 AND 5
        AND l_shipmode IN ('AIR','AIR REG')
        AND l_shipinstruct = 'DELIVER IN PERSON'
    )
   OR (
        p_partkey = l_partkey
        AND p_brand = 'Brand#23'
        AND p_container IN ('MED BAG','MED BOX','MED PKG','MED PACK')
        AND l_quantity BETWEEN 10 AND 20
        AND p_size BETWEEN 1 AND 10
        AND l_shipmode IN ('AIR','AIR REG')
        AND l_shipinstruct = 'DELIVER IN PERSON'
    )
   OR (
        p_partkey = l_partkey
        AND p_brand = 'Brand#34'
        AND p_container IN ('LG CASE','LG BOX','LG PACK','LG PKG')
        AND l_quantity BETWEEN 20 AND 30
        AND p_size BETWEEN 1 AND 15
        AND l_shipmode IN ('AIR','AIR REG')
        AND l_shipinstruct = 'DELIVER IN PERSON'
    );

-- TPCH_Query20
SELECT
    s_name,
    s_address
FROM supplier,
     nation
WHERE s_suppkey IN (
        SELECT ps_suppkey
        FROM partsupp
        WHERE ps_partkey IN (
                SELECT p_partkey
                FROM part
                WHERE p_name LIKE 'forest%'
            )
          AND ps_availqty > (
                SELECT 0.5 * SUM(l_quantity)
                FROM lineitem
                WHERE l_partkey = ps_partkey
                  AND l_suppkey = ps_suppkey
                  AND l_shipdate >= DATE '1994-01-01'
                  AND l_shipdate <  DATE '1995-01-01'
            )
    )
  AND s_nationkey = n_nationkey
  AND n_name = 'CANADA'
ORDER BY s_name;

-- TPCH_Query21
SELECT
    s_name,
    COUNT(*) AS numwait
FROM supplier,
     lineitem l1,
     orders,
     nation
WHERE s_suppkey = l1.l_suppkey
  AND o_orderkey = l1.l_orderkey
  AND o_orderstatus = 'F'
  AND l1.l_receiptdate > l1.l_commitdate
  AND EXISTS (
        SELECT *
        FROM lineitem l2
        WHERE l2.l_orderkey = l1.l_orderkey
          AND l2.l_suppkey <> l1.l_suppkey
    )
  AND NOT EXISTS (
        SELECT *
        FROM lineitem l3
        WHERE l3.l_orderkey = l1.l_orderkey
          AND l3.l_suppkey <> l1.l_suppkey
          AND l3.l_receiptdate > l3.l_commitdate
    )
  AND s_nationkey = n_nationkey
  AND n_name = 'SAUDI ARABIA'
GROUP BY s_name
ORDER BY numwait DESC, s_name
LIMIT 100;

-- TPCH_Query22
SELECT
    cntrycode,
    COUNT(*) AS numcust,
    SUM(c_acctbal) AS totacctbal
FROM (
    SELECT
        SUBSTRING(c_phone, 1, 2) AS cntrycode,
        c_acctbal
    FROM customer
    WHERE SUBSTRING(c_phone, 1, 2) IN ('13','31','23','29','30','18','17')
      AND c_acctbal > (
            SELECT AVG(c_acctbal)
            FROM customer
            WHERE c_acctbal > 0.00
              AND SUBSTRING(c_phone, 1, 2) IN ('13','31','23','29','30','18','17')
        )
      AND NOT EXISTS (
            SELECT *
            FROM orders
            WHERE o_custkey = c_custkey
        )
) AS custsale
GROUP BY cntrycode
ORDER BY cntrycode;
Pythonコード
import re
import time
import snowflake.connector as sf

conn = sf.connect(
    account   = "<your_account>",
    user      = "<your_username>",
    role      = "<your_role>",
    database  = "snowflake_sample_data",
    schema    = "tpch_sf1000",
    warehouse = "compute_wh_gen2", ## "compute_wh_gen1"
    private_key = "<your_private_key>"
)

print("Connection OK:", conn)

NUM_OF_SAMPLES = 5
LABEL_PATTERN = re.compile(r"\s*TPCH\s*Query\s*\d+", re.IGNORECASE)

def benchmark(is_gen2: bool):
    print(f"Benchmarking {'with' if is_gen2 else 'without'} GEN2")

    result_path = "results_with_gen2.txt" if is_gen2 else "results_without_gen2.txt"
    with conn.cursor() as cs:
        cs.execute("ALTER SESSION SET USE_CACHED_RESULT=FALSE")

    with open(result_path, "w", encoding="utf-8") as result_file, \
         open("tpch.sql", encoding="utf-8") as f:
        
        all_queries = f.read()
        results = "Test\tTime\n"

        for _ in range(NUM_OF_SAMPLES):
            # セミコロンで分割し、空要素はスキップ
            for raw in filter(None, (q.strip() for q in all_queries.split(";"))):

                # ラベルを安全に抽出(無ければスキップ)
                match = LABEL_PATTERN.search(raw)
                if not match:
                    continue
                label = match.group(0)          # ex) "TPCH_Query1"

                # 実行対象 SQL 本体(コメントより後ろ)
                sql = raw.split("\n", 1)[-1].strip()

                start_ts = time.time()
                with conn.cursor() as cs:       # 自動クローズ
                    cs.execute(sql)
                    cs.fetchall()               
                delta = time.time() - start_ts

                print(f"{label}: {delta:3.5f}s")
                results += f"{label}\t{delta:3.5f}\n"

        result_file.write(results)

if __name__ == '__main__':
    # Running benchmarks without gen2
    # benchmark(False)

    # Running benchmarks with gen2
    benchmark(True)

参考文献

snowflake-tpch-timingを参考に、コードを作成しました。

Discussion