Closed14

https://github.com/oniku-2929/tf_agones_minikube_hyperv/issues/1の原因をつきとめる

oniku2929oniku2929

issueだとexternalのせいでとまってるみたいな、恣意的なログの抜き出し方をしてしまってるが
どうやら勘違いのようで,どうもterraform-provider-helmの中で処理が止まっているっぽい・・・?

2022-11-24T10:57:00.792+0900 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/helm\"] (close)" is waiting for "helm_release.gameserver (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "data.helm_template.gameserver (expand)" is waiting for "helm_release.agones (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.metallb_minikube (expand)" is waiting for "data.helm_template.metallb_minikube (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.agones (expand)" is waiting for "data.helm_template.agones (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "output.gameserver_yaml_manifests (expand)" is waiting for "data.helm_template.gameserver (expand)"
2022-11-24T10:57:00.793+0900 [TRACE] dag/walk: vertex "helm_release.gameserver (expand)" is waiting for "data.helm_template.gameserver (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "output.metallb_minikube_manifests (expand)" is waiting for "data.helm_template.metallb_minikube (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "output.agones_yaml_manifests (expand)" is waiting for "data.helm_template.agones (expand)"
2022-11-24T10:57:00.814+0900 [TRACE] dag/walk: vertex "data.helm_template.metallb_minikube (expand)" is waiting for "helm_release.metallb (expand)"
2022-11-24T10:57:00.830+0900 [TRACE] dag/walk: vertex "root" is waiting for "output.metallb_minikube_manifests (expand)"
2022-11-24T10:57:00.935+0900 [WARN]  running operation was forcefully canceled
2022-11-24T10:57:00.935+0900 [INFO]  backend/local: plan operation was force-cancelled by interrupt
2022-11-24T10:57:00.935+0900 [TRACE] provider.terraform-provider-helm_v2.7.1_x5: plugin received interrupt signal, ignoring: count=2 timestamp=2022-11-24T10:57:00.935+0900
2022-11-24T10:57:00.935+0900 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2022-11-24T10:57:00.936+0900 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
2022-11-24T10:57:00.937+0900 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2022-11-24T10:57:00.938+0900 [ERROR] plugin.(*GRPCProvider).PlanResourceChange: error="rpc error: code = Unavailable desc = error reading from server: EOF"
2022-11-24T10:57:00.938+0900 [ERROR] vertex "helm_release.metallb" error: Plugin did not respond
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb": visit complete, with errors
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb (expand)": dynamic subgraph encountered errors: Plugin did not respond
2022-11-24T10:57:00.938+0900 [ERROR] vertex "helm_release.metallb (expand)" error: Plugin did not respond
2022-11-24T10:57:00.938+0900 [TRACE] vertex "helm_release.metallb (expand)": visit complete, with errors
2022-11-24T10:57:00.938+0900 [TRACE] dag/walk: upstream of "data.helm_template.metallb_minikube (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "helm_release.metallb_minikube (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "data.helm_template.agones (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "helm_release.agones (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "data.helm_template.gameserver (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.gameserver_yaml_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.metallb_minikube_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.939+0900 [TRACE] dag/walk: upstream of "output.agones_yaml_manifests (expand)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "helm_release.gameserver (expand)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/hashicorp/helm\"] (close)" errored, so skipping
2022-11-24T10:57:00.940+0900 [TRACE] dag/walk: upstream of "root" errored, so skipping
2022-11-24T10:57:00.940+0900 [WARN]  terraform: stop complete
2022-11-24T10:57:00.941+0900 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.7.1/linux_amd64/terraform-provider-helm_v2.7.1_x5 pid=20044
2022-11-24T10:57:00.941+0900 [DEBUG] provider: plugin exited
oniku2929oniku2929

helmのproviderをdebugしたい
https://developer.hashicorp.com/terraform/plugin/debugging

Terraform CLI Development Overrides
https://developer.hashicorp.com/terraform/plugin/debugging#terraform-cli-development-overrides
を使うと,provider呼び出し時に使用されるバイナリを指定したものに上書きできる
->自分でビルドしたProviderのバイナリを指定してterraform CLIから実行させる事ができる

https://developer.hashicorp.com/terraform/cli/config/config-file#development-overrides-for-provider-developers

Note: Development overrides work only in Terraform v0.14 and later. Using a dev_overrides block in your CLI configuration will cause Terraform v0.13 to reject the configuration as invalid.
oniku2929oniku2929

ログだけみてもどこでとまっているかよくわからないので
dlvアタッチしてデバッグする
https://developer.hashicorp.com/terraform/plugin/debugging#debugger-based-debugging

1.ソースを引っ張ってくる

git clone https://github.com/hashicorp/terraform-provider-helm.git
cd ./terraform-provider-helm

2.Makeファイルいじって、最適化,インライン化OFFでビルドする
https://github.com/hashicorp/terraform-provider-helm/blob/a994218e0844af8097b44f539ae61e964f1b14af/GNUmakefile#L27

build: fmtcheck
        go build -v -gcflags 'all=-N -l' .

make buildする

3.バイナリができる、デバッグの為に3つプロンプトを用意する「providerをデバッグ実行する用」「terraform CLI実行する用」「dlv操作用」

「providerをデバッグ実行する用」のプロンプトで以下作業を実施

https://developer.hashicorp.com/terraform/plugin/debugging#delve-cli
にあるように以下のコマンドを使用して、dlv経由でProviderのバイナリをdebugモード起動する(ポートは9999番に固定してある)

dlv exec --accept-multiclient --listen=localhost:9999  --continue --headless ./terraform-provider-helm -- -debug

起動すると、以下のような出力が得られる

API server listening at: 127.0.0.1:9999
{"@level":"debug","@message":"plugin address","@timestamp":"2022-11-29T09:46:16.739246+09:00","address":"/tmp/plugin654492491","network":"unix"}Provider started. To attach Terraform CLI, set the TF_REATTACH_PROVIDERS environment variable with the following:

        TF_REATTACH_PROVIDERS='{"registry.terraform.io/hashicorp/helm":{"Protocol":"grpc","ProtocolVersion":5,"Pid":347,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin654492491"}}}'

「terraform CLI実行する用」のプロンプトで
以下を実行

export TF_REATTACH_PROVIDERS='{"registry.terraform.io/hashicorp/helm":{"Protocol":"grpc","ProtocolVersion":5,"Pid":347,"Test":true,"Addr":{"Network":"unix","String":"/tmp/plugin654492491"}}}'

「dlv操作用」のプロンプトにて以下コマンドで「providerをデバッグ実行する用」のプロンプトで起動したプロセスにつなぎにいく

dlv connect 127.0.0.1:9999

dlvのコマンド使って任意の調べたい場所にブレーク等貼っておく
ブレークはり終わったら、続行しておく

(dlv) b helm.resourceDiff
Breakpoint 1 set at 0x34e7892 for github.com/hashicorp/terraform-provider-helm/helm.resourceDiff() /home/****/terraform-provider-helm/helm/resource_release.go:751 
(dlv) continue 

この状態で「terraform CLI実行する用」のプロンプトで
たとえばterraform planを実行すると
以下のような感じで任意の箇所でとまっていることが確認できる

> github.com/hashicorp/terraform-provider-helm/helm.resourceDiff() /****/terraform-provider-helm/helm/resource_release.go:751 (hits goroutine(107):1 total:1) (PC: 0x34e7892)
   746:
   747:         d.SetId("")
   748:         return nil
   749: }
   750:
=> 751: func resourceDiff(ctx context.Context, d *schema.ResourceDiff, meta interface{}) error {
   752:         logID := fmt.Sprintf("[resourceDiff: %s]", d.Get("name").(string))
   753:         debug("%s Start", logID)
   754:         m := meta.(*Meta)
   755:         name := d.Get("name").(string)
   756:         namespace := d.Get("namespace").(string) 

あとはnext,step,watch,bあたりのgdbと同じようなコマンドでどこでスタックしているか調べる

oniku2929oniku2929

どんどん中にもぐっていく
helm-providerの中というよりはhelm自体の中でとまってる

> helm.sh/helm/v3/pkg/action.(*ChartPathOptions).LocateChart() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/action/install.go:719 (PC: 0x3462933)
   714:
   715:         if c.Verify {
   716:                 dl.Verify = downloader.VerifyAlways
   717:         }
   718:         if c.RepoURL != "" {
=> 719:                 chartURL, err := repo.FindChartInAuthAndTLSAndPassRepoURL(c.RepoURL, c.Username, c.Password, name, version,
   720:                         c.CertFile, c.KeyFile, c.CaFile, c.InsecureSkipTLSverify, c.PassCredentialsAll, getter.All(settings))
   721:                 if err != nil {
   722:                         return "", err
   723:                 }
   724:                 name = chartURL

さらに深い階層にもぐる

> helm.sh/helm/v3/pkg/repo.FindChartInAuthAndTLSAndPassRepoURL() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/repo/chartrepo.go:252 (PC: 0x3429d06)
   247:         }
   248:         r, err := NewChartRepository(&c, getters)
   249:         if err != nil {
   250:                 return "", err
   251:         }
=> 252:         idx, err := r.DownloadIndexFile()
   253:         if err != nil {
   254:                 return "", errors.Wrapf(err, "looks like %q is not a valid 
chart repository or cannot be reached", repoURL)
   255:         }
   256:
   257:         // Read the index file for the repository to get chart information 
and return chart URL

chartrepo.goのDownloadIndexFileの中でスタックしてるっぽい

(dlv)
> helm.sh/helm/v3/pkg/repo.(*ChartRepository).DownloadIndexFile() /***/terraform-provider-helm/vendor/helm.sh/helm/v3/pkg/repo/chartrepo.go:128 (PC: 0x3427e6d)
   123:         parsedURL.RawPath = path.Join(parsedURL.RawPath, "index.yaml")     
   124:         parsedURL.Path = path.Join(parsedURL.Path, "index.yaml")
   125:
   126:         indexURL := parsedURL.String()
   127:         // TODO add user-agent
=> 128:         resp, err := r.Client.Get(indexURL,
   129:                 getter.WithURL(r.Config.URL),

ここ
https://github.com/hashicorp/terraform-provider-helm/blob/a994218e0844af8097b44f539ae61e964f1b14af/vendor/helm.sh/helm/v3/pkg/repo/chartrepo.go#L128-L134

oniku2929oniku2929

helmの中でとまるということは普通にhelm使いしてるだけで起こる可能性があるはず、
helmのissueになにかないかなとおもって
https://github.com/helm/helm/issues

stuckとかwslでしらべてみると

https://github.com/helm/helm/issues/9819#issuecomment-887646972

After digging a lot of pages over the internet I have decided to do some Wiresharing of WSL traffic. As result, I have found that seems like my ISP has a problem with the DNS nameserver. When I call helm repo add jetstack https://charts.jetstack.io I see two DNS queries requests - A and AAAA. You could google what's a difference, but in a short AAAA query is used to get IPv6 addresses. In my case AAAA response contains a lot of A records. This produces some confusion on OS or Application level (I'm not sure where exactly) and the helm repo index request is stuck in case if the system gets AAAA response before A response. So I came up with a solution to set nameserver to Google or Cloudflare directly in /etc/resolv.conf. Those nameservers return only IPv6 addresses for AAAA queries and IPv4 for A.
Here is the gist of how to change nameservers in WSL2 (or you could google how to do it) https://gist.github.com/coltenkrauter/608cfe02319ce60facd76373249b8ca6. Or if you don't want to make any changes in WSL you could change the DNS server in your network setup (router or host machine). It also should work.

❯ cat /etc/resolv.conf

Cloudflare

nameserver 1.1.1.1

Google

nameserver 8.8.8.8
nameserver 8.8.4.4
Also, WSL has some problems when works with IPv6 microsoft/WSL#4518

https://github.com/helm/helm/issues/6027#issuecomment-1213235161

can confirm that, for me, the issue seemed to be related to the built-in dns server that WSL uses (mentioned in the issue linked by @lchdev). i swapped it in /etc/resolv.conf from 172.26.192.1 (in my case) to my local router, and now I can helm dep up consistently. not sure what value the WSL dns server plays, but everything currently seems to be fine.

こ、これか!
wslの/etc/resolv.confをいじってみる

oniku2929oniku2929

I see two DNS queries requests - A and AAAA. You could google what's a difference, but in a short AAAA query is used to get IPv6 addresses. In my case AAAA response contains a lot of A records. This produces some confusion on OS or Application level (I'm not sure where exactly) and the helm repo index request is stuck in case if the system gets AAAA response before A response.

と記載があるので、せっかくなのでresolve.confが自動設定状態とそうじゃない自分でGoogle Public DNS等を追加した場合でnslookupの値を見てみる

oniku2929oniku2929

自分で設定した場合(generateResolvConf = false)

### Cloudflare
nameserver 1.1.1.1
### Google
nameserver 8.8.8.8
nameserver 8.8.4.4

nslookupで返ってくるレコード

Server:         1.1.1.1
Address:        1.1.1.1#53

Non-authoritative answer:
Name:   metallb.github.io
Address: 185.199.108.153
Name:   metallb.github.io
Address: 185.199.109.153
Name:   metallb.github.io
Address: 185.199.110.153
Name:   metallb.github.io
Address: 185.199.111.153
Name:   metallb.github.io
Address: 2606:50c0:8002::153
Name:   metallb.github.io
Address: 2606:50c0:8000::153
Name:   metallb.github.io
Address: 2606:50c0:8003::153
Name:   metallb.github.io
Address: 2606:50c0:8001::153
oniku2929oniku2929

自動設定の場合(generateResolvConf = true もしくは初期状態)

nslookup metallb.github.io
Server:         172.29.0.1
Address:        172.29.0.1#53

Non-authoritative answer:
Name:   metallb.github.io
Address: 185.199.110.153
Name:   metallb.github.io
Address: 185.199.108.153
Name:   metallb.github.io
Address: 185.199.109.153
Name:   metallb.github.io
Address: 185.199.111.153
Name:   dns1.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:1
Name:   dns2.p05.nsone.net
Address: 2a00:edc0:6259:7:5::2
Name:   dns3.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:3
Name:   dns4.p05.nsone.net
Address: 2a00:edc0:6259:7:5::4
Name:   ns-393.awsdns-49.com
Address: 2600:9000:5301:8900::1
Name:   ns-692.awsdns-22.net
Address: 2600:9000:5302:b400::1
Name:   metallb.github.io
Address: 2606:50c0:8000::153
Name:   metallb.github.io
Address: 2606:50c0:8002::153
Name:   metallb.github.io
Address: 2606:50c0:8001::153
Name:   metallb.github.io
Address: 2606:50c0:8003::153
Name:   dns1.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:1
Name:   dns2.p05.nsone.net
Address: 2a00:edc0:6259:7:5::2
Name:   dns3.p05.nsone.net
Address: 2620:4d:4000:6259:7:5:0:3
Name:   dns4.p05.nsone.net
Address: 2a00:edc0:6259:7:5::4
Name:   ns-393.awsdns-49.com
Address: 2600:9000:5301:8900::1
Name:   ns-692.awsdns-22.net
Address: 2600:9000:5302:b400::1
Name:   ns-1339.awsdns-39.org
Address: 2600:9000:5305:3b00::1
Name:   ns-1622.awsdns-10.co.uk
Address: 2600:9000:5306:5600::1
Name:   dns1.p05.nsone.net
Address: 198.51.44.5
Name:   dns2.p05.nsone.net
Address: 198.51.45.5
Name:   dns3.p05.nsone.net
Address: 198.51.44.69
Name:   dns4.p05.nsone.net
Address: 198.51.45.69
Name:   ns-393.awsdns-49.com
Address: 205.251.193.137
Name:   ns-692.awsdns-22.net
Address: 205.251.194.180
Name:   ns-1339.awsdns-39.org
Address: 205.251.197.59
Name:   ns-1622.awsdns-10.co.uk
Address: 205.251.198.86
oniku2929oniku2929

自動設定の場合に似たようなAレコードとAAAAレコードのほかに、以下のレコードがある

dns1.p05.nsone.net とか似たようなレコード
netlifyというサービスのDNSレコード
https://docs.netlify.com/domains-https/netlify-dns/

ns-692.awsdns-22.net とか似たようなレコード
AWS Route53のレコード
https://docs.aws.amazon.com/ja_jp/Route53/latest/DeveloperGuide/ResourceRecordTypes.html

SOAのレコードがかえってきてる?

ちなみにWindows側でコマンドプロンプトでnslookupしてもそれっぽいレコードはかえってきてないっぽい

> metallb.github.io
サーバー:  UnKnown
Address:  ***********

権限のない回答:
名前:    metallb.github.io
Addresses:  2606:50c0:8002::153
          2606:50c0:8001::153
          2606:50c0:8000::153
          2606:50c0:8003::153
          185.199.111.153
          185.199.108.153
          185.199.110.153
          185.199.109.153
oniku2929oniku2929

自動設定の場合(generateResolvConf = true もしくは初期状態)
だと通常のhelmコマンドでもスタックした

helm repo add metallb https://metallb.github.io/metallb
helm repo add agones https://agones.dev/chart/stable

さきにコマンド確認しとけばよかったなぁ
😢

oniku2929oniku2929

https://github.com/microsoft/WSL/issues/8365
こちらのほうがよりホット

https://github.com/microsoft/WSL/issues/8365#issuecomment-1295250969

A couple of comments for some of the issues being described here.
There's a known issue where the necessary Firewall Rules to all the DNS request to be proxied are incorrect, and thus block DNS requests from the WSL container. We have put a fix for this in this next WSL release: https://github.com/microsoft/WSL/releases/tag/0.70.5
Secondly, there's a known Firewall configuration which will always block proxied DNS requests from the WSL container: this is the "BlockAllInbound" setting on network profiles. You can see if this is set by opening "Windows Security" - clicking "Firewall & network protection" -- then clicking on one of the 3 Network Profiles which is being applied to your connected network adapters (Domain, Public, or Private). That page lists a setting, "Blocks all incoming connections, including those in the list of allowed apps." ---> if this is checked, then DNS requests will not be proxied from the WSL container.

https://github.com/microsoft/WSL/releases/tag/0.70.5

Addresses an issue where DNS requests to resolve names from a WSL2 container would fail when using the default configured NAT DNS Server

このスクラップは2022/12/01にクローズされました