The main symptom of this problem would be that the vCenter Server Heartbeat console or Neverfail Management Client console would show that the services had failed over, but if you were to try to ping it, it wouldn’t respond.
Logically, there there are some hypotheses you could come up with:
1) Network packet filter isn’t revealed on the active server, so we can’t connect to it.
2) Something wrong w/ the service.
3) The console is wrong and on the backend, nothing failed over.
These would be all wrong.
What we found was that it was an issue with ARP caching on the switches. Because the VM or host abruptly fell off the network, the switches hadn’t expired the ARP entries and that they were stale. You would think that it would be fixed in a minute after the ARP entries expired, but I guess the chain could take a little longer.
Probably the best way to troubleshoot this would be to get on a host on the same network segment and try a ping. If that fails, you could run “arp -a” and check to see if you indeed have the right mac address of the host you want to connect to. If not, you could probably log into the switch to delete the entry or you can create a task to run the command during switchover:
“C:\Program Files\VMware\VMware vCenter Server Heartbeat\R2\bin>nfpktfltr.exe arp”
You should then see the switchover happen without the long delay.
Recently, I set up my own DNS server. I hadn’t run a public DNS server in years. Since the tvpads recently had some DNS issues, I thought maybe I could help eliminate some support calls by running my own DNS server, pointing to the right servers. Boy was I wrong! For some reason, even though some others on comcast would point to my server as a DNS server, they would still get answers that were not the answers given from my server! It was so bizarre! I had never seen it before. If they ran nslookup and used “server <DNS Server IP>” and typed in the name they wanted to resolve, it’s almost as if the server statement prior was ignored and they were getting the IP that the ISP wanted to give them.
Anyways, that’s not the problem I’m writing about here. Surfing some websites became slow for some reason and I thought I would investigate. The first thing I went to see was what connections I had to the outside world. I went to the router and looked at the traffic. Here’s what I saw:
Obviously, that’s DNS traffic. Well, go to the DNS server and what do I see? This:
10-Oct-2013 15:34:14.228 queries: client 22.214.171.124#58070: query: irlwinning.com IN ANY +E
10-Oct-2013 15:34:14.670 queries: client 126.96.36.199#26073: query: irlwinning.com IN ANY +E
Many different lines of the same exact query. What is it? I have no idea. I’ve decided just to shut down DNS queries for now, but if anyone knows anything about this, I’d be happy to hear from you.