https://github.com/oniku-2929/tf_agones_minikube_hyperv/issues/1の原因をつきとめる
issueだとexternalのせいでとまってるみたいな、恣意的なログの抜き出し方をしてしまってるが
どうやら勘違いのようで,どうもterraform-provider-helm
の中で処理が止まっているっぽい・・・?
2022-11-24T10:57:00.792+0900 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/helm\"] (close)" is waiting for "helm_release.gameserver (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "data.helm_template.gameserver (expand)" is waiting for "helm_release.agones (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.metallb_minikube (expand)" is waiting for "data.helm_template.metallb_minikube (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.agones (expand)" is waiting for "data.helm_template.agones (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "output.gameserver_yaml_manifests (expand)" is waiting for "data.helm_template.gameserver (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.gameserver (expand)" is waiting for "data.helm_template.gameserver (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "output.metallb_minikube_manifests (expand)" is waiting for "data.helm_template.metallb_minikube (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "output.agones_yaml_manifests (expand)" is waiting for "data.helm_template.agones (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "data.helm_template.metallb_minikube (expand)" is waiting for "helm_release.metallb (expand)"
2022-11-24T10:57:00.830+0900 [TRACE] dag/walk: vertex "root" is waiting for "output.metallb_minikube_manifests (expand)"
2022-11-24T10:57:00.935+0900 [WARN] running operation was forcefully canceled
2022-11-24T10:57:00.935+0900 [INFO] backend/local: plan operation was force-cancelled by interrupt
2022-11-24T10:57:00.935+0900 [TRACE] provider.terraform-provider-helm_v2.7.1_x5: plugin received interrupt signal, ignoring: count=2 timestamp=2022-11-24T10:57:00.935+0900
2022-11-24T10:57:00.935+0900 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2022-11-24T10:57:00.936+0900 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
2022-11-24T10:57:00.937+0900 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2022-11-24T10:57:00.938+0900 [ERROR] plugin.(*GRPCProvider).PlanResourceChange: error="rpc error: code = Unavailable desc = error reading from server: EOF"
2022-11-24T10:57:00.938+0900 [ERROR] vertex "helm_release.metallb" error: Plugin did not respond
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb": visit complete, with errors
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb (expand)": dynamic subgraph encountered errors: Plugin did not respond
2022-11-24T10:57:00.938+0900 [ERROR] vertex "helm_release.metallb (expand)" error: Plugin did not respond
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb (expand)": visit complete, with errors
2022-11-24T10:57:00.938+0900 [TRACE] dag/walk: upstream of "data.helm_template.metallb_minikube (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "helm_release.metallb_minikube (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "data.helm_template.agones (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "helm_release.agones (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "data.helm_template.gameserver (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.gameserver_yaml_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.metallb_minikube_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.agones_yaml_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "helm_release.gameserver (expand)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/hashicorp/helm\"] (close)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "root" errored, so skipping
2022-11-24T10:57:00.940+0900 [WARN] terraform: stop complete
2022-11-24T10:57:00.941+0900 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.7.1/linux_amd64/terraform-provider-helm_v2.7.1_x5 pid=20044
2022-11-24T10:57:00.941+0900 [DEBUG] provider: plugin exited
Git
helmのproviderをdebugしたい
Terraform CLI Development Overrides
https://developer.hashicorp.com/terraform/plugin/debugging#terraform-cli-development-overrides
を使うと,provider呼び出し時に使用されるバイナリを指定したものに上書きできる
->自分でビルドしたProviderのバイナリを指定してterraform CLIから実行させる事ができる
Note: Development overrides work only in Terraform v0.14 and later. Using a dev_overrides block in your CLI configuration will cause Terraform v0.13 to reject the configuration as invalid.
ログだけみてもどこでとまっているかよくわからないので
dlvアタッチしてデバッグする
1.ソースを引っ張ってくる
git clone https://github.com/hashicorp/terraform-provider-helm.git
cd ./terraform-provider-helm
2.Makeファイルいじって、最適化,インライン化OFFでビルドする
build: fmtcheck
go build -v -gcflags 'all=-N -l' .
make buildする
3.バイナリができる、デバッグの為に3つプロンプトを用意する「providerをデバッグ実行する用」「terraform CLI実行する用」「dlv操作用」
「providerをデバッグ実行する用」のプロンプトで以下作業を実施
にあるように以下のコマンドを使用して、dlv経由でProviderのバイナリをdebugモード起動する(ポートは9999番に固定してある)
dlv exec --accept-multiclient --listen=localhost:9999 --continue --headless ./terraform-provider-helm -- -debug
起動すると、以下のような出力が得られる
API server listening at: 127.0.0.1:9999
{"@level":"debug","@message":"plugin address","@timestamp":"2022-11-29T09:46:16.739246+09:00","address":"/tmp/plugin654492491","network":"unix"}Provider started. To attach Terraform CLI, set the TF_REATTACH_PROVIDERS environment variable with the following:
TF_REATTACH_PROVIDERS='{"registry.terraform.io/hashicorp/helm":{"Protocol":"grpc","ProtocolVersion":5,"Pid":347,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin654492491"}}}'
「terraform CLI実行する用」のプロンプトで
以下を実行
export TF_REATTACH_PROVIDERS='{"registry.terraform.io/hashicorp/helm":{"Protocol":"grpc","ProtocolVersion":5,"Pid":347,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin654492491"}}}'
「dlv操作用」のプロンプトにて以下コマンドで「providerをデバッグ実行する用」のプロンプトで起動したプロセスにつなぎにいく
dlv connect 127.0.0.1:9999
dlvのコマンド使って任意の調べたい場所にブレーク等貼っておく
ブレークはり終わったら、続行しておく
(dlv) b helm.resourceDiff
Breakpoint 1 set at 0x34e7892 for github.com/hashicorp/terraform-provider-helm/helm.resourceDiff() /home/****/terraform-provider-helm/helm/resource_release.go:751
(dlv) continue
この状態で「terraform CLI実行する用」のプロンプトで
たとえばterraform plan
を実行すると
以下のような感じで任意の箇所でとまっていることが確認できる
> github.com/hashicorp/terraform-provider-helm/helm.resourceDiff() /****/terraform-provider-helm/helm/resource_release.go:751 (hits goroutine(107):1 total:1) (PC: 0x34e7892)
746:
747: d.SetId("")
748: return nil
749: }
750:
=> 751: func resourceDiff(ctx context.Context, d *schema.ResourceDiff, meta interface{}) error {
752: logID := fmt.Sprintf("[resourceDiff: %s]", d.Get("name").(string))
753: debug("%s Start", logID)
754: m := meta.(*Meta)
755: name := d.Get("name").(string)
756: namespace := d.Get("namespace").(string)
あとはnext,step,watch,bあたりのgdbと同じようなコマンドでどこでスタックしているか調べる
どんどん中にもぐっていく
helm-providerの中というよりはhelm自体の中でとまってる
> helm.sh/helm/v3/pkg/action.(*ChartPathOptions).LocateChart() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/action/install.go:719 (PC: 0x3462933)
714:
715: if c.Verify {
716: dl.Verify = downloader.VerifyAlways
717: }
718: if c.RepoURL != "" {
=> 719: chartURL, err := repo.FindChartInAuthAndTLSAndPassRepoURL(c.RepoURL, c.Username, c.Password, name, version,
720: c.CertFile, c.KeyFile, c.CaFile, c.InsecureSkipTLSverify, c.PassCredentialsAll, getter.All(settings))
721: if err != nil {
722: return "", err
723: }
724: name = chartURL
さらに深い階層にもぐる
> helm.sh/helm/v3/pkg/repo.FindChartInAuthAndTLSAndPassRepoURL() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/repo/chartrepo.go:252 (PC: 0x3429d06)
247: }
248: r, err := NewChartRepository(&c, getters)
249: if err != nil {
250: return "", err
251: }
=> 252: idx, err := r.DownloadIndexFile()
253: if err != nil {
254: return "", errors.Wrapf(err, "looks like %q is not a valid
chart repository or cannot be reached", repoURL)
255: }
256:
257: // Read the index file for the repository to get chart information
and return chart URL
chartrepo.goのDownloadIndexFileの中でスタックしてるっぽい
(dlv)
> helm.sh/helm/v3/pkg/repo.(*ChartRepository).DownloadIndexFile() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/repo/chartrepo.go:128 (PC: 0x3427e6d)
123: parsedURL.RawPath = path.Join(parsedURL.RawPath, "index.yaml")
124: parsedURL.Path = path.Join(parsedURL.Path, "index.yaml")
125:
126: indexURL := parsedURL.String()
127: // TODO add user-agent
=> 128: resp, err := r.Client.Get(indexURL,
129: getter.WithURL(r.Config.URL),
ここ
helmの中でとまるということは普通にhelm使いしてるだけで起こる可能性があるはず、
helmのissueになにかないかなとおもって
stuck
とかwsl
でしらべてみると
After digging a lot of pages over the internet I have decided to do some Wiresharing of WSL traffic. As result, I have found that seems like my ISP has a problem with the DNS nameserver. When I call helm repo add jetstack https://charts.jetstack.io I see two DNS queries requests - A and AAAA. You could google what's a difference, but in a short AAAA query is used to get IPv6 addresses. In my case AAAA response contains a lot of A records. This produces some confusion on OS or Application level (I'm not sure where exactly) and the helm repo index request is stuck in case if the system gets AAAA response before A response. So I came up with a solution to set nameserver to Google or Cloudflare directly in /etc/resolv.conf. Those nameservers return only IPv6 addresses for AAAA queries and IPv4 for A.
Here is the gist of how to change nameservers in WSL2 (or you could google how to do it) https://gist.github.com/coltenkrauter/608cfe02319ce60facd76373249b8ca6. Or if you don't want to make any changes in WSL you could change the DNS server in your network setup (router or host machine). It also should work.
❯ cat /etc/resolv.conf
Cloudflare
nameserver 1.1.1.1
nameserver 8.8.8.8
nameserver 8.8.4.4
Also, WSL has some problems when works with IPv6 microsoft/WSL#4518
can confirm that, for me, the issue seemed to be related to the built-in dns server that WSL uses (mentioned in the issue linked by @lchdev). i swapped it in /etc/resolv.conf from 172.26.192.1 (in my case) to my local router, and now I can helm dep up consistently. not sure what value the WSL dns server plays, but everything currently seems to be fine.
こ、これか!
wslの/etc/resolv.confをいじってみる
-/etc/wsl.confのgenerateResolvConf = false
を設定する
-wsl再起動例:wsl -t Ubuntu-20.04
-/etc/resolv.confを編集して8.8.8.8等を追加する
この状態でterraform planすると通った!
I see two DNS queries requests - A and AAAA. You could google what's a difference, but in a short AAAA query is used to get IPv6 addresses. In my case AAAA response contains a lot of A records. This produces some confusion on OS or Application level (I'm not sure where exactly) and the helm repo index request is stuck in case if the system gets AAAA response before A response.
と記載があるので、せっかくなのでresolve.confが自動設定状態とそうじゃない自分でGoogle Public DNS等を追加した場合でnslookupの値を見てみる
自分で設定した場合(generateResolvConf = false)
### Cloudflare
nameserver 1.1.1.1
### Google
nameserver 8.8.8.8
nameserver 8.8.4.4
nslookupで返ってくるレコード
Server: 1.1.1.1
Address: 1.1.1.1#53
Non-authoritative answer:
Name: metallb.github.io
Address: 185.199.108.153
Name: metallb.github.io
Address: 185.199.109.153
Name: metallb.github.io
Address: 185.199.110.153
Name: metallb.github.io
Address: 185.199.111.153
Name: metallb.github.io
Address: 2606:50c0:8002::153
Name: metallb.github.io
Address: 2606:50c0:8000::153
Name: metallb.github.io
Address: 2606:50c0:8003::153
Name: metallb.github.io
Address: 2606:50c0:8001::153
自動設定の場合(generateResolvConf = true もしくは初期状態)
nslookup metallb.github.io
Server: 172.29.0.1
Address: 172.29.0.1#53
Non-authoritative answer:
Name: metallb.github.io
Address: 185.199.110.153
Name: metallb.github.io
Address: 185.199.108.153
Name: metallb.github.io
Address: 185.199.109.153
Name: metallb.github.io
Address: 185.199.111.153
Name: dns1.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:1
Name: dns2.p05.nsone.net
Address: 2a00:edc0:6259:7:5::2
Name: dns3.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:3
Name: dns4.p05.nsone.net
Address: 2a00:edc0:6259:7:5::4
Name: ns-393.awsdns-49.com
Address: 2600:9000:5301:8900::1
Name: ns-692.awsdns-22.net
Address: 2600:9000:5302:b400::1
Name: metallb.github.io
Address: 2606:50c0:8000::153
Name: metallb.github.io
Address: 2606:50c0:8002::153
Name: metallb.github.io
Address: 2606:50c0:8001::153
Name: metallb.github.io
Address: 2606:50c0:8003::153
Name: dns1.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:1
Name: dns2.p05.nsone.net
Address: 2a00:edc0:6259:7:5::2
Name: dns3.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:3
Name: dns4.p05.nsone.net
Address: 2a00:edc0:6259:7:5::4
Name: ns-393.awsdns-49.com
Address: 2600:9000:5301:8900::1
Name: ns-692.awsdns-22.net
Address: 2600:9000:5302:b400::1
Name: ns-1339.awsdns-39.org
Address: 2600:9000:5305:3b00::1
Name: ns-1622.awsdns-10.co.uk
Address: 2600:9000:5306:5600::1
Name: dns1.p05.nsone.net
Address: 198.51.44.5
Name: dns2.p05.nsone.net
Address: 198.51.45.5
Name: dns3.p05.nsone.net
Address: 198.51.44.69
Name: dns4.p05.nsone.net
Address: 198.51.45.69
Name: ns-393.awsdns-49.com
Address: 205.251.193.137
Name: ns-692.awsdns-22.net
Address: 205.251.194.180
Name: ns-1339.awsdns-39.org
Address: 205.251.197.59
Name: ns-1622.awsdns-10.co.uk
Address: 205.251.198.86
自動設定の場合に似たようなAレコードとAAAAレコードのほかに、以下のレコードがある
dns1.p05.nsone.net とか似たようなレコード
netlifyというサービスのDNSレコード
https://docs.netlify.com/domains-https/netlify-dns/
ns-692.awsdns-22.net とか似たようなレコード
AWS Route53のレコード
https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/ResourceRecordTypes.html
SOAのレコードがかえってきてる?
ちなみにWindows側でコマンドプロンプトでnslookupしてもそれっぽいレコードはかえってきてないっぽい
> metallb.github.io
サーバー: UnKnown
Address: ***********
権限のない回答:
名前: metallb.github.io
Addresses: 2606:50c0:8002::153
2606:50c0:8001::153
2606:50c0:8000::153
2606:50c0:8003::153
185.199.111.153
185.199.108.153
185.199.110.153
185.199.109.153
自動設定の場合(generateResolvConf = true もしくは初期状態)
だと通常のhelmコマンドでもスタックした
helm repo add metallb https://metallb.github.io/metallb
helm repo add agones https://agones.dev/chart/stable
さきにコマンド確認しとけばよかったなぁ
😢
こちらのほうがよりホット
A couple of comments for some of the issues being described here.
There's a known issue where the necessary Firewall Rules to all the DNS request to be proxied are incorrect, and thus block DNS requests from the WSL container. We have put a fix for this in this next WSL release: https://github.com/microsoft/WSL/releases/tag/0.70.5
Secondly, there's a known Firewall configuration which will always block proxied DNS requests from the WSL container: this is the "BlockAllInbound" setting on network profiles. You can see if this is set by opening "Windows Security" - clicking "Firewall & network protection" -- then clicking on one of the 3 Network Profiles which is being applied to your connected network adapters (Domain, Public, or Private). That page lists a setting, "Blocks all incoming connections, including those in the list of allowed apps." ---> if this is checked, then DNS requests will not be proxied from the WSL container.
Addresses an issue where DNS requests to resolve names from a WSL2 container would fail when using the default configured NAT DNS Server
まとめると
-自分で/etc/resolv.confを設定する
-https://github.com/microsoft/WSL/releases/tag/0.70.5以降のバージョンのwslを使う
-Windows Defenderの設定いじって,プロキシDNSクエリを通るようにする
のいずれかの対応を行うと、helmコマンドやterraformのhelm構成のデータ取得の際に固まる現象は回避できる