Closed3

Prism Centralのデプロイが物理シングルノードで失敗する件の確認

amiamuamiamu

prism centralの展開進行状況の対象ログが判明したので見てみるとGenesis関連のエラーが出ていた

nutanix@NTNX-f42b92df-A-CVM:192.168.0.34:~$ cat genesis.out.20240805-103516Z | grep WARNING | grep 2024-08-05
↓これが延々と出ていた。svm_ipsとは?
2024-08-05 10:57:37,259Z WARNING 36209136 genesis_client.py:81 Failed to reach genesis on any of the svm_ips: [u'192.168.0.23']

調べたらインストールコマンドとして紹介されていた
How to Install Community Edition Episode 1
Cluster Creation Script:

cluster --cluster_function_list="one_node_cluster" --svm_ips= (CVM IP) create

ちょっと試してみた svm_ipsに値が入れば行けるのか?と思ったため
nutanix@NTNX-2e8b8fe1-A-CVM:192.168.0.21:~$ cluster --cluster_function_list="one_node_cluster" --svm_ips=192.168.0.21 create
2024-08-06 12:21:52,844Z INFO MainThread cluster:2943 Executing action create on SVMs 192.168.0.21
2024-08-06 12:21:52,851Z INFO MainThread service_utils.py:1615 The model type SanDisk SSD PLUS for SVM boot disk /dev/sda is not in disk inventory, defaulting to DAS-SATA
2024-08-06 12:21:53,035Z INFO MainThread nvme_disk.py:836 Disk /dev/nvme0n1 is not a latency optimized device
2024-08-06 12:21:53,216Z ERROR MainThread cluster:923 Cannot create one node backup cluster as not enough boot ssds are present to facilitate backup on this cluster Available no of boot ssds is 1
2024-08-06 12:21:53,216Z ERROR MainThread cluster:3106 Operation failed
→エラーが出て動かなかった

コミュニティエディションのガイドに書いてあるコマンド

cluster -s (CVM IP) --redundancy_factor=1 create

https://x.com/smzksts/status/1820799682454720805

ガイドに記載のコマンドが正しいとのこと
無謀にも下のコマンドでクラスター作成したら成功したのでCentralのデプロイまでは実施した。

cluster -s 192.168.0.21 --redundancy_factor=1 --svm_ips=192.168.0.21 create

結局状況変わらず、以下のWARNING が出続けて一向に進んでなかった。

ls -l /home/nutanix/data/logs | grep genesis.out*
tail -f /home/nutanix/data/logs/genesis.out.20240806-121810Z

2024-08-06 13:11:33,727Z WARNING 86343984 genesis_client.py:81 Failed to reach genesis on any of the svm_ips: [u'192.168.0.23']
2024-08-06 13:11:34,242Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning
2024-08-06 13:11:35,244Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning
2024-08-06 13:11:36,246Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning
2024-08-06 13:11:36,734Z WARNING 86343984 genesis_client.py:81 Failed to reach genesis on any of the svm_ips: [u'192.168.0.23']
2024-08-06 13:11:37,248Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning
2024-08-06 13:11:38,250Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning
2024-08-06 13:11:39,253Z INFO 88711792 ergon_utils.py:394 Task Application Deployment (622c028c-c54d-4b48-5111-3586d3e37340) has status : kRunning

以下はERRORでgenesis.outログをgrepした結果

nutanix@NTNX-9733548d-A-CVM:192.168.0.21:~/data/logs$ cat genesis.out.20240805-103516Z | grep ERROR
2024-08-05 10:35:20,664Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:21,666Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:22,667Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:23,668Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:24,670Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:25,671Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:26,673Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:27,674Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:28,676Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:29,677Z ERROR 81233488 rpc.py:303 Json Rpc request for unknown Rpc object NodeManager
2024-08-05 10:35:30,688Z ERROR 11629392 genesis_utils.py:2731 Unable to fetch cluster_functions from cached config_proto
2024-08-05 10:35:30,688Z ERROR 11629392 nvm_utils.py:58 Unable to check whether it is PCVM or not.
2024-08-05 10:35:30,690Z ERROR 11629392 genesis_utils.py:2731 Unable to fetch cluster_functions from cached config_proto
2024-08-05 10:35:30,690Z ERROR 11629392 nvm_utils.py:58 Unable to check whether it is PCVM or not.
2024-08-05 10:35:33,714Z ERROR 11629392 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:35:34,302Z ERROR 11629392 ipv4config.py:1661 Unable to get IPMI network settings
2024-08-05 10:35:34,973Z ERROR 11629392 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:35:35,578Z ERROR 11629392 node_manager.py:4732 cluster_function_list is None but we expected exactly 1function
2024-08-05 10:35:36,749Z ERROR 11629392 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:35:42,632Z ERROR 11629392 node_manager.py:2219 Could not get release version
2024-08-05 10:35:45,048Z ERROR 38607952 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:35:45,049Z ERROR 38512208 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:37:35,761Z ERROR 38607952 node_manager.py:7046 Zookeeper mapping is unconfigured
2024-08-05 10:37:56,298Z ERROR 11629392 genesis_utils.py:304 Failed to get CVM id
2024-08-05 10:37:56,298Z ERROR 11629392 genesis_utils.py:2828 Valid zookeeper session not provided
2024-08-05 10:37:56,299Z ERROR 11629392 genesis_utils.py:2511 Invalid zk session
2024-08-05 10:38:16,384Z ERROR 11629392 genesis_utils.py:2951 Could not reach zookeeper
2024-08-05 10:38:16,385Z ERROR 11629392 genesis_utils.py:304 Failed to get CVM id
2024-08-05 10:39:01,756Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:03,651Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:06,175Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:21,402Z ERROR 11629392 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:26,996Z ERROR 38607632 configuration.py:149 Could not parse Zeus configuration Error parsing message
2024-08-05 10:39:27,898Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:28,165Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:28,629Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:29,067Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:29,148Z ERROR 38608912 genesis_utils.py:4337 Could not get config proto
2024-08-05 10:39:30,229Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:31,376Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:32,549Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:33,698Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:34,917Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:36,045Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:37,166Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:38,284Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:39,461Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:40,637Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:41,828Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:43,031Z ERROR 38609232 genesis_utils.py:2708 Unable to fetch cluster_functions from cached proto
2024-08-05 10:39:43,543Z ERROR 11629392 node_manager.py:3245 Failed to configure Hades. Check Hades status.
2024-08-05 10:39:54,878Z ERROR 38609552 node_manager.py:3585 Could not get the value of znode at: /appliance/logical/pki_ca_certs, reason: ok
2024-08-05 10:39:56,496Z ERROR 36207216 genesis_utils.py:2469 Failed to read zknode /appliance/logical/cluster_disabled_services with error 'no node'
2024-08-05 10:39:56,690Z ERROR 36210256 ergon_utils.py:452 Failed to call TaskList with exception Failed to send RPC request
2024-08-05 10:39:56,690Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 10:39:56,997Z ERROR 36206896 genesis_utils.py:2469 Failed to read zknode /appliance/logical/cluster_disabled_services with error 'no node'
2024-08-05 10:40:01,021Z ERROR 36210096 ahv_network.py:44 <AHVNetwork> 192.168.0.20: Could not find the OVS module at /root/acropolis_modules/ovs.py
2024-08-05 10:40:01,022Z ERROR 36210096 network_configuration_keeper.py:683 Could not get host network object for underlying host. Can't fetch the network details for svm: 192.168.0.21
2024-08-05 10:40:01,022Z ERROR 36210096 network_configuration_keeper.py:867 Could not fetch network details for svm: 192.168.0.21. Will retry after 1800 secs.
2024-08-05 10:40:01,022Z ERROR 36210096 network_configuration_keeper.py:890 Timed out attempts to commit network proto to zknode: /appliance/logical/genesis/network_configuration/f80ec8d8-0b1f-4f7c-9ab3-9ae5a82c1b11 for node: 192.168.0.21. Will retry after 1800 secs
2024-08-05 10:40:01,022Z ERROR 36206896 node_manager.py:3796 List of services that are  not present in SERVICES_ORDER [] 
2024-08-05 10:40:02,450Z ERROR 30067472 ha_service.py:373 No valid stargate IP addresses to forward to
2024-08-05 10:40:07,352Z ERROR 30068112 ha_service.py:373 No valid stargate IP addresses to forward to
2024-08-05 10:40:13,610Z ERROR 36206896 ikat_proxy_service.py:326 ikat_proxy lock file does not exist
2024-08-05 10:40:16,998Z ERROR 36206736 upgrade_helper.py:1543 Failed to fetch upgrade progress proto with error code: 6
2024-08-05 10:40:17,000Z ERROR 36206736 ergon_utils.py:491 Ergon service not reachable
2024-08-05 10:40:32,680Z ERROR 30067472 ha_service.py:373 No valid stargate IP addresses to forward to
2024-08-05 10:41:02,787Z ERROR 30067472 ha_service.py:373 No valid stargate IP addresses to forward to
2024-08-05 10:41:05,612Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:06,471Z ERROR 30067472 ha_service.py:373 No valid stargate IP addresses to forward to
2024-08-05 10:41:06,735Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:07,820Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:08,924Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:10,044Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:11,158Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:14,978Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:17,515Z ERROR 30067632 rdma_helper.py:209 Error in reading /etc/nutanix/nic_config.json: [Errno 2] No such file or directory: '/etc/nutanix/nic_config.json'
2024-08-05 10:41:30,887Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:32,223Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:33,461Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:34,782Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:36,275Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:36,477Z ERROR 30067472 notify.py:329 notification=StargateStatus available=True ip_address=192.168.0.21 service_vm_id=2
2024-08-05 10:41:36,493Z ERROR 30067472 notify.py:321 Cluster health send notification failed with exception - Failed to send RPC request
2024-08-05 10:41:41,525Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:43,077Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:45,245Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:41:47,456Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:42:00,759Z ERROR 38609552 node_manager.py:10793 Exception caught while formatting vm_bios_uuid Not Specified. Exception  list index out of range
2024-08-05 10:42:00,802Z ERROR 36182000 download_utils.py:1259 Unable to read url https://download.nutanix.com/lcm/2.0/.site_type.txt with cmd '['curl', '-s', '-S', '-f', '-L', u'https://download.nutanix.com/lcm/2.0/.site_type.txt']' - curl: (6) Could not resolve host: download.nutanix.com; Unknown error
2024-08-05 10:42:09,662Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_entity_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,664Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_available_version_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,664Z ERROR 36182000 cpdb_utils.py:2461 An error occurred while reading lcm db: lcm_entity(False), lcm_available_version(False)
2024-08-05 10:42:09,698Z ERROR 38609232 client.py:503 Cannot read factory_config.json on CO host 192.168.0.20 error cat: /root/factory_config.json: No such file or directory
2024-08-05 10:42:09,698Z ERROR 38609232 client.py:729 Could not extract factory config for the compute only node. Verify the node is imaged properly as compute only node
2024-08-05 10:42:09,698Z ERROR 38609232 genesis_utils.py:5486 Error in retrieving node information for host_ip 192.168.0.20
2024-08-05 10:42:09,773Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_entity_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,782Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_available_version_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,783Z ERROR 36182000 cpdb_utils.py:2461 An error occurred while reading lcm db: lcm_entity(False), lcm_available_version(False)
2024-08-05 10:42:09,932Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_entity_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,938Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_available_version_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:09,939Z ERROR 36182000 cpdb_utils.py:2461 An error occurred while reading lcm db: lcm_entity(False), lcm_available_version(False)
2024-08-05 10:42:10,359Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_entity_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:10,363Z ERROR 36182000 lcm_cpdb.py:42 DB Error: ('Db error', InsightsInterfaceError(u'Invalid entity type: lcm_available_version_v2 specified (kInvalidRequest) (kQueryInvalidEntityType)',))
2024-08-05 10:42:10,363Z ERROR 36182000 cpdb_utils.py:2461 An error occurred while reading lcm db: lcm_entity(False), lcm_available_version(False)
2024-08-05 10:42:16,150Z ERROR 38704976 node_manager.py:3796 List of services that are  not present in SERVICES_ORDER [] 
2024-08-05 10:44:56,696Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 10:45:41,166Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:46:34,539Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:49:41,204Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:49:56,701Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 10:53:39,253Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 10:57:18,407Z ERROR 36209136 ergon_utils.py:37 task_uuid can not be None
2024-08-05 11:01:39,214Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 11:05:39,270Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 11:09:39,236Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 11:13:39,213Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 11:17:39,209Z ERROR 38608592 configuration.py:239 Failed to commit Zeus configuration: bad version
2024-08-05 12:02:32,935Z ERROR 36209136 deployment_utils.py:825 192.168.0.23 IP is not reachable
2024-08-05 12:02:32,935Z ERROR 36209136 deployment.py:302 Encountered Exception in deployment: The Prism Central VM is not reachable
2024-08-05 12:02:32,941Z ERROR 36209136 state_machine.py:900 Current state: VM Deployment handler returned error: kFailed :: Encountered Exception in deployment: The Prism Central VM is not reachable
2024-08-05 12:02:32,947Z ERROR 36209136 state_machine.py:858 Aborting sm 'App Deployment SM' due to error 'kFailed' in current state VM Deployment
2024-08-05 12:02:33,941Z ERROR 30069712 deployment.py:1237 Encountered Exception in deployment: The Prism Central VM is not reachable
2024-08-05 12:02:33,942Z ERROR 30069712 deployment.py:1238 Error in deployment:Traceback (most recent call last):
2024-08-05 12:04:54,642Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:09:54,648Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:14:54,653Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:19:54,658Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:24:54,664Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:29:54,669Z ERROR 36210256 ergon_utils.py:37 task_uuid can not be None
2024-08-05 12:38:09,431Z ERROR 36209136 ergon_utils.py:37 task_uuid can not be None
nutanix@NTNX-9733548d-A-CVM:192.168.0.21:~/data/logs$ 
amiamuamiamu

■prism centralの展開進行状況確認
ls -l /home/nutanix/data/logs | grep genesis.out*
tail -f /home/nutanix/data/logs/genesis.out.20240806-121810Z

■prism centralのダウンロード進行状況確認
ls -l /home/nutanix/data/logs | grep cluster_config*
tail -f /home/nutanix/data/logs/cluster_config.out.20240806-123648Z

このスクラップは2024/10/16にクローズされました