Open6
ZetaSQLで遊ぶ
ZetaSQLをコンパイルして遊ぶ
ZetaSQLはGoogle社内で使われているSQL方言でStandard SQLとかGoogle SQLとか呼ばれてたりするもの
GCPではBigQueryとかSpannerで使われている
READMEによるとUbuntu 20.04 / gcc / bazelがデフォルトのビルド環境らしい
いつも使っている環境がWindowsなのでdevcontainerを設定していく。
.devcontainer/devcontainer.json
{
"name": "Ubuntu",
"image": "mcr.microsoft.com/devcontainers/base:focal",
"features": {
"ghcr.io/devcontainers-community/features/bazel:1": {}
},
}
parserをビルドしてみる
> bazel build zetasql/parser/parser
...
> ls -al bazel-bin/zetasql/parser/
...
-r-xr-xr-x 1 vscode vscode 4176118 Nov 10 05:00 libparser.a
-r-xr-xr-x 1 vscode vscode 2400992 Nov 10 05:00 libparser.so
...
続いてexecute_queryをビルドしようとしたが失敗
> ./docker_build.sh execute_query
...
[2,038 / 2,136] 16 actions running
Compiling zetasql/parser/parse_tree_serializer.cc; 487s processwrapper-sandbox
Foreign Cc - CMake: Building mstch; 477s processwrapper-sandbox
Compiling zetasql/resolved_ast/resolved_ast_rewrite_visitor.cc; 364s processwrapper-sandbox
Compiling zetasql/public/analyzer_options.cc; 343s processwrapper-sandbox
Compiling zetasql/analyzer/rewriters/grouping_set_rewriter.cc; 327s processwrapper-sandbox
Compiling zetasql/resolved_ast/rewrite_utils.cc; 220s processwrapper-sandbox
Compiling zetasql/analyzer/input_argument_type_resolver_helper.cc; 190s processwrapper-sandbox
Compiling zetasql/testdata/sample_annotation.cc; 185s processwrapper-sandbox ...
Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/vscode/.cache/bazel/_bazel_vscode/34b0ce70e90eaf1fa7adae67aad47134/server/jvm.out')
調べたらメモリ不足っぽい(ちなみにRAMは32GB)ので対処法を考える
wslのメモリ割り当て上限を上げる、bazelのリソース/ジョブ数を制限する等試したがうまく行かないので、方針を変更してGCPで適当にインスタンスを借りることにする
Compute Engineでe2-customとして24 vcpu、メモリ64GB、ストレージ50GB、Ubuntu20.04を借りて環境を作っていく
以下historyの一部抜粋
1 git clone https://github.com/google/zetasql.git
2 cd zetasql/
5 sudo apt update
6 sudo apt install default-jdk
7 sudo apt install apt-transport-https curl gnupg -y
9 curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor >bazel-archive-keyring.gpg
10 sudo mv bazel-archive-keyring.gpg /usr/share/keyrings
11 echo "deb [arch=amd64 signed-by=/usr/share/keyrings/bazel-archive-keyring.gpg] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
12 sudo apt update && sudo apt install bazel
14 sudo apt update && sudo apt install bazel-6.5.0
26 sudo apt install tzdata
27 sudo apt install make
28 bazel build zetasql/tools/execute_query/execute_query --verbose_failures --sandbox_debug
無事ビルド完了
> ./bazel-bin/zetasql/tools/execute_query/execute_query
Usage: execute_query { "<sql>" | {--web [--port=<port>] } }
Pass --help for a full list of flags.
ディスク/CPUの利用は以下の通り
> df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/root 49434 13661 35758 28% /
(ビルド時間だけでなく環境構築等をしていた時間も含むので参考程度に)
正直この環境で100%に張り付くならうちのおんぼろPC(Ryzen7 2700X)では厳しかったのも納得かもしれない
WASMでビルドしたいと思っているので、まずコンパイラをclangに変更してみる
が、なぜかエラー
$ bazel clean
$ sudo apt update
$ sudo apt install clang
$ CC=clang bazel build zetasql/tools/execute_query/execute_query --sandbox_debug --verbose_failures
...
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
bazel-out/k8-fastbuild/bin/zetasql/reference_impl/_objs/evaluation/function.pic.o:function.cc:function differential_privacy::ApproxBounds<double>::Serialize() const: error: undefined reference to 'google::protobuf::RepeatedField<long>::~RepeatedField()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
エラーが出ているのはreference_implのところなので、一旦parser側で検証してみる
→正常終了し、ファイル生成もされるが、なぜかobjdumpのコンパイラ部分が一致しない(?)
$ CC=clang bazel build --config=clang zetasql/parser/parser
$ ls -al bazel-bin/zetasql/parser/
...
-r-xr-xr-x 1 ... 3356762 Nov 15 14:09 libparser.a
-r-xr-xr-x 1 ... 271 Nov 15 14:03 libparser.a-2.params
-r-xr-xr-x 1 ... 2146136 Nov 15 14:09 libparser.so
-r-xr-xr-x 1 ... 385 Nov 15 14:03 libparser.so-2.params
...
$ objdump --full-contents --section=.comment bazel-bin/zetasql/parser/libparser.so
bazel-bin/zetasql/parser/libparser.so: file format elf64-x86-64
Contents of section .comment:
0000 00474343 3a202855 62756e74 7520392e .GCC: (Ubuntu 9.
0010 342e302d 31756275 6e747531 7e32302e 4.0-1ubuntu1~20.
0020 30342e32 2920392e 342e3000 636c616e 04.2) 9.4.0.clan
0030 67207665 7273696f 6e203130 2e302e30 g version 10.0.0
0040 2d347562 756e7475 312000 -4ubuntu1 .
$ objdump --full-contents --section=.comment bazel-bin/zetasql/parser/libparser.a
In archive bazel-bin/zetasql/parser/libparser.a:
bison_parser.pic.o: file format elf64-x86-64
Contents of section .comment:
0000 00636c61 6e672076 65727369 6f6e2031 .clang version 1
0010 302e302e 302d3475 62756e74 75312000 0.0.0-4ubuntu1 .
parser.pic.o: file format elf64-x86-64
Contents of section .comment:
0000 00636c61 6e672076 65727369 6f6e2031 .clang version 1
0010 302e302e 302d3475 62756e74 75312000 0.0.0-4ubuntu1 .
unparser.pic.o: file format elf64-x86-64
Contents of section .comment:
0000 00636c61 6e672076 65727369 6f6e2031 .clang version 1
0010 302e302e 302d3475 62756e74 75312000 0.0.0-4ubuntu1 .