Hi,
We are experiencing a weird issue and I need anyone experience to help us out, we do have a SR with VMware but it is going no where.
If we reboot our vCenter server (Windows Server), we have a SQL VM Cluster that lose network connectivity and start a split-brain situation and then the cluster fails. Each SQL Nodes has 3 vmxnet3, and all 3 vnic gets disconnected as soon as vcenter is rebooted. We are able to restore the clustering almost right away, but it still cause business issues.
Also, the only logs that shows the lost of network connectivity is the Microsoft clustering log. The events logs, VM and ESX logs don't trap anything. Since the issue occurs while vcenter is getting rebooted we loose some visibility while its down.
We are able to reproduce the issue at will. But we are kind at of lost, since vCenter has pretty much nothing to do with network connectivity. And that SQL Cluster has nothing to do with our vCenter server, as the vCenter DB runs somewhere else.
We do have multiple vCenter, running multiple SQL VM clusters, and that issue doesnt happen there.
We should have a maintenance window over the weekend, to do some testing. VMware suggested to update the VMware tools and to remove the vShield.
We will try to validate if the "issue" happen to the other VMs but since they are not Clustered VMs, I wonder if we would be able to pick up the network issue as it seem to be fairly quick like in milliseconds.
Infra quick specs:
Running VMware ESXi, 5.5.0, 2302651
The ESX are running on HP Blades BL460c Gen8 on a C7000 chassis with latest firmwares.
Using distributed vswitch version 5.5.
Anyone can propose thing to look for or test?
Thank you!