Closed6

Snowpark が Public Preview になったので触ってみる

kanata2kanata2

Snowpark is a new developer experience that provides an intuitive API for querying and handling data.
Snowpark simplifies the process of building complex data pipelines and allows you to interact with Snowflake directly without moving data to the system where your application code runs.

Snowpark の発表があったときはジョブのスケジューリングとか含めまるっと Snowflake のプラットフォーム上でやる何かなのかと思ってたけど違った。

kanata2kanata2

この辺参考にして

こんなコードで

import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._

object Main {
  def main(args: Array[String]): Unit = {
    val url = sys.env.get("SNOWFLAKE_URL").getOrElse("")
    val password = sys.env.get("SNOWFLAKE_PASSWORD").getOrElse("")
    val builder = Session.builder.configs(Map(
      "URL" -> url,
      "USER" -> "KANATA2",
      "PASSWORD" -> password,
      "WAREHOUSE" -> "COMPUTE_WH",
      "DB" -> "SANDBOX",
      "SCHEMA" -> "KANATA2_SANDBOX",
    ))
    val session = builder.create
    var dfTables = session.table("INFORMATION_SCHEMA.TABLES")
      .filter(col("TABLE_SCHEMA") === "KANATA2_SANDBOX")
    var tableCount = dfTables.count()
    var currentDB = session.getCurrentDatabase
    println(s"Number of tables in the $currentDB database: $tableCount")

    var dfPublicSchemaTables = session.table("INFORMATION_SCHEMA.TABLES")
      .filter(col("TABLE_SCHEMA") === "KANATA2_SANDBOX")
      .select(col("TABLE_NAME"))
    dfPublicSchemaTables.show()
  }
}

実行したらこんな結果に

$ SNOWFLAKE_URL=xxx SNOWFLAKE_PASSWORD=xxx sbt "runMain Main"
...
[info] running Main
[run-main-0]  INFO (Logging.scala:22) - Closing stderr and redirecting to stdout
[run-main-0]  INFO (Logging.scala:22) - Done closing stderr and redirecting to stdout
[run-main-0]  INFO (Logging.scala:22) - Actively querying parameter snowpark_lazy_analysis from server.
[run-main-0]  INFO (Logging.scala:22) - Actively querying parameter QUERY_TAG from server.
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7a] alter session set query_tag = 'com.snowflake.snowpark.DataFrame.count
Main$.main(Main.scala:19)'
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7a] alter session set query_tag = 'com.snowflake.snowpark.DataFrame.count
Main$.main(Main.scala:19)'
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-229c-0000-13cd0090113a]  SELECT count(1 :: int) AS "COUNT(1)" FROM ( SELECT  *  FROM ( SELECT  *  FROM (INFORMATION_SCHEMA.TABLES)) WHERE ("TABLE_SCHEMA" = 'KANATA2_SANDBOX')) LIMIT 1
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7e] alter session unset query_tag
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-229c-0000-13cd0090113e] alter session unset query_tag
Number of tables in the Some("SANDBOX") database: 4
[run-main-0]  INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe82]  SELECT  *  FROM ( SELECT "TABLE_NAME" FROM ( SELECT  *  FROM ( SELECT  *  FROM (INFORMATION_SCHEMA.TABLES)) WHERE ("TABLE_SCHEMA" = 'KANATA2_SANDBOX'))) LIMIT 10
--------------------------------------
|"TABLE_NAME"                        |
--------------------------------------
|TEST                                |
|TEST2                               |
|SNOWCHANGE_TEST                     |
|TEST_USER_ACTIVITY_LOG_AVRO         |
--------------------------------------

[success] Total time: 27 s, completed 2021/06/19 8:40:34
kanata2kanata2

実行ログや履歴を見て、クエリタグがきちんとついてるやつとそうでないやつがあるのが気になった。
count メソッドだと OK で select メソッドだとだめとか?(気になったら深堀りする。。。)

このスクラップは2021/09/09にクローズされました