Open6

ZetaSQLで遊ぶ

MokerMoker

ZetaSQLをコンパイルして遊ぶ
https://github.com/google/zetasql

ZetaSQLはGoogle社内で使われているSQL方言でStandard SQLとかGoogle SQLとか呼ばれてたりするもの
GCPではBigQueryとかSpannerで使われている

MokerMoker

READMEによるとUbuntu 20.04 / gcc / bazelがデフォルトのビルド環境らしい
いつも使っている環境がWindowsなのでdevcontainerを設定していく。

.devcontainer/devcontainer.json

{
    "name": "Ubuntu",
    "image": "mcr.microsoft.com/devcontainers/base:focal",
    "features": {
        "ghcr.io/devcontainers-community/features/bazel:1": {}
    },
}
MokerMoker

parserをビルドしてみる

> bazel build zetasql/parser/parser
...
> ls -al bazel-bin/zetasql/parser/
...
-r-xr-xr-x  1 vscode vscode 4176118 Nov 10 05:00 libparser.a
-r-xr-xr-x  1 vscode vscode 2400992 Nov 10 05:00 libparser.so
...
MokerMoker

続いてexecute_queryをビルドしようとしたが失敗

> ./docker_build.sh execute_query
...
[2,038 / 2,136] 16 actions running
    Compiling zetasql/parser/parse_tree_serializer.cc; 487s processwrapper-sandbox
    Foreign Cc - CMake: Building mstch; 477s processwrapper-sandbox
    Compiling zetasql/resolved_ast/resolved_ast_rewrite_visitor.cc; 364s processwrapper-sandbox
    Compiling zetasql/public/analyzer_options.cc; 343s processwrapper-sandbox
    Compiling zetasql/analyzer/rewriters/grouping_set_rewriter.cc; 327s processwrapper-sandbox
    Compiling zetasql/resolved_ast/rewrite_utils.cc; 220s processwrapper-sandbox
    Compiling zetasql/analyzer/input_argument_type_resolver_helper.cc; 190s processwrapper-sandbox
    Compiling zetasql/testdata/sample_annotation.cc; 185s processwrapper-sandbox ...

Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/vscode/.cache/bazel/_bazel_vscode/34b0ce70e90eaf1fa7adae67aad47134/server/jvm.out')

調べたらメモリ不足っぽい(ちなみにRAMは32GB)ので対処法を考える
https://github.com/tensorflow/tensorflow/issues/41480#issuecomment-1060272920

MokerMoker

wslのメモリ割り当て上限を上げる、bazelのリソース/ジョブ数を制限する等試したがうまく行かないので、方針を変更してGCPで適当にインスタンスを借りることにする

Compute Engineでe2-customとして24 vcpu、メモリ64GB、ストレージ50GB、Ubuntu20.04を借りて環境を作っていく
以下historyの一部抜粋

    1  git clone https://github.com/google/zetasql.git
    2  cd zetasql/
    5  sudo apt update
    6  sudo apt install default-jdk
    7  sudo apt install apt-transport-https curl gnupg -y
    9  curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor >bazel-archive-keyring.gpg
   10  sudo mv bazel-archive-keyring.gpg /usr/share/keyrings
   11  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/bazel-archive-keyring.gpg] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
   12  sudo apt update && sudo apt install bazel
   14  sudo apt update && sudo apt install bazel-6.5.0
   26  sudo apt install tzdata
   27  sudo apt install make
   28  bazel build zetasql/tools/execute_query/execute_query --verbose_failures --sandbox_debug

無事ビルド完了

> ./bazel-bin/zetasql/tools/execute_query/execute_query
Usage: execute_query { "<sql>" | {--web [--port=<port>] } }
Pass --help for a full list of flags.

ディスク/CPUの利用は以下の通り

> df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/root          49434 13661     35758  28% /

CPU使用率
(ビルド時間だけでなく環境構築等をしていた時間も含むので参考程度に)

正直この環境で100%に張り付くならうちのおんぼろPC(Ryzen7 2700X)では厳しかったのも納得かもしれない

MokerMoker

WASMでビルドしたいと思っているので、まずコンパイラをclangに変更してみる
が、なぜかエラー

$ bazel clean
$ sudo apt update
$ sudo apt install clang
$ CC=clang bazel build zetasql/tools/execute_query/execute_query --sandbox_debug --verbose_failures
...
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

エラーが出ているのはreference_implのところなので、一旦parser側で検証してみる
→正常終了し、ファイル生成もされるが、なぜかobjdumpのコンパイラ部分が一致しない(?)

$ CC=clang bazel build --config=clang  zetasql/parser/parser
$ ls -al bazel-bin/zetasql/parser/
...
-r-xr-xr-x 1 ... 3356762 Nov 15 14:09 libparser.a
-r-xr-xr-x 1 ...        271 Nov 15 14:03 libparser.a-2.params
-r-xr-xr-x 1 ... 2146136 Nov 15 14:09 libparser.so
-r-xr-xr-x 1 ...        385 Nov 15 14:03 libparser.so-2.params
...
$ objdump --full-contents --section=.comment bazel-bin/zetasql/parser/libparser.so
bazel-bin/zetasql/parser/libparser.so:     file format elf64-x86-64
Contents of section .comment:
 0000 00474343 3a202855 62756e74 7520392e  .GCC: (Ubuntu 9.
 0010 342e302d 31756275 6e747531 7e32302e  4.0-1ubuntu1~20.
 0020 30342e32 2920392e 342e3000 636c616e  04.2) 9.4.0.clan
 0030 67207665 7273696f 6e203130 2e302e30  g version 10.0.0
 0040 2d347562 756e7475 312000             -4ubuntu1 .
$ objdump --full-contents --section=.comment bazel-bin/zetasql/parser/libparser.a
In archive bazel-bin/zetasql/parser/libparser.a:

bison_parser.pic.o:     file format elf64-x86-64
Contents of section .comment:
 0000 00636c61 6e672076 65727369 6f6e2031  .clang version 1
 0010 302e302e 302d3475 62756e74 75312000  0.0.0-4ubuntu1 .

parser.pic.o:     file format elf64-x86-64
Contents of section .comment:
 0000 00636c61 6e672076 65727369 6f6e2031  .clang version 1
 0010 302e302e 302d3475 62756e74 75312000  0.0.0-4ubuntu1 .

unparser.pic.o:     file format elf64-x86-64
Contents of section .comment:
 0000 00636c61 6e672076 65727369 6f6e2031  .clang version 1
 0010 302e302e 302d3475 62756e74 75312000  0.0.0-4ubuntu1 .