Closed6
Snowpark が Public Preview になったので触ってみる
まだ名前しか知らないのであとで調べつつ試してみる
ドキュメント: https://docs.snowflake.com/en/developer-guide/snowpark/index.html
Snowpark is a new developer experience that provides an intuitive API for querying and handling data.
Snowpark simplifies the process of building complex data pipelines and allows you to interact with Snowflake directly without moving data to the system where your application code runs.
Snowpark の発表があったときはジョブのスケジューリングとか含めまるっと Snowflake のプラットフォーム上でやる何かなのかと思ってたけど違った。
この辺参考にして
- https://docs.snowflake.com/en/developer-guide/snowpark/setup.html
- https://docs.snowflake.com/en/developer-guide/snowpark/example.html
こんなコードで
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
object Main {
def main(args: Array[String]): Unit = {
val url = sys.env.get("SNOWFLAKE_URL").getOrElse("")
val password = sys.env.get("SNOWFLAKE_PASSWORD").getOrElse("")
val builder = Session.builder.configs(Map(
"URL" -> url,
"USER" -> "KANATA2",
"PASSWORD" -> password,
"WAREHOUSE" -> "COMPUTE_WH",
"DB" -> "SANDBOX",
"SCHEMA" -> "KANATA2_SANDBOX",
))
val session = builder.create
var dfTables = session.table("INFORMATION_SCHEMA.TABLES")
.filter(col("TABLE_SCHEMA") === "KANATA2_SANDBOX")
var tableCount = dfTables.count()
var currentDB = session.getCurrentDatabase
println(s"Number of tables in the $currentDB database: $tableCount")
var dfPublicSchemaTables = session.table("INFORMATION_SCHEMA.TABLES")
.filter(col("TABLE_SCHEMA") === "KANATA2_SANDBOX")
.select(col("TABLE_NAME"))
dfPublicSchemaTables.show()
}
}
実行したらこんな結果に
$ SNOWFLAKE_URL=xxx SNOWFLAKE_PASSWORD=xxx sbt "runMain Main"
...
[info] running Main
[run-main-0] INFO (Logging.scala:22) - Closing stderr and redirecting to stdout
[run-main-0] INFO (Logging.scala:22) - Done closing stderr and redirecting to stdout
[run-main-0] INFO (Logging.scala:22) - Actively querying parameter snowpark_lazy_analysis from server.
[run-main-0] INFO (Logging.scala:22) - Actively querying parameter QUERY_TAG from server.
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7a] alter session set query_tag = 'com.snowflake.snowpark.DataFrame.count
Main$.main(Main.scala:19)'
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7a] alter session set query_tag = 'com.snowflake.snowpark.DataFrame.count
Main$.main(Main.scala:19)'
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-229c-0000-13cd0090113a] SELECT count(1 :: int) AS "COUNT(1)" FROM ( SELECT * FROM ( SELECT * FROM (INFORMATION_SCHEMA.TABLES)) WHERE ("TABLE_SCHEMA" = 'KANATA2_SANDBOX')) LIMIT 1
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe7e] alter session unset query_tag
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-229c-0000-13cd0090113e] alter session unset query_tag
Number of tables in the Some("SANDBOX") database: 4
[run-main-0] INFO (Logging.scala:22) - Execute query [queryID: 019d050c-0000-22a7-0000-13cd008efe82] SELECT * FROM ( SELECT "TABLE_NAME" FROM ( SELECT * FROM ( SELECT * FROM (INFORMATION_SCHEMA.TABLES)) WHERE ("TABLE_SCHEMA" = 'KANATA2_SANDBOX'))) LIMIT 10
--------------------------------------
|"TABLE_NAME" |
--------------------------------------
|TEST |
|TEST2 |
|SNOWCHANGE_TEST |
|TEST_USER_ACTIVITY_LOG_AVRO |
--------------------------------------
[success] Total time: 27 s, completed 2021/06/19 8:40:34
履歴には Snowpark 0.6.0
実行ログや履歴を見て、クエリタグがきちんとついてるやつとそうでないやつがあるのが気になった。
count
メソッドだと OK で select
メソッドだとだめとか?(気になったら深堀りする。。。)
使い方はなんとなくわかったのであとは適当に API ドキュメントを見つつ動かしてみようと思う(Scala まったく詳しくないので苦戦しそう)
このスクラップは2021/09/09にクローズされました