if vCenter Server Heartbeat or Neverfail Heartbeat failover appears to be not working or taking a long time …

The main symptom of this problem would be that the vCenter Server Heartbeat console or Neverfail Management Client console would show that the services had failed over, but if you were to try to ping it, it wouldn’t respond.
Logically, there there are some hypotheses you could come up with:

1) Network packet filter isn’t revealed on the active server, so we can’t connect to it.
2) Something wrong w/ the service.
3) The console is wrong and on the backend, nothing failed over.

These would be all wrong.

What we found was that it was an issue with ARP caching on the switches. Because the VM or host abruptly fell off the network, the switches hadn’t expired the ARP entries and that they were stale. You would think that it would be fixed in a minute after the ARP entries expired, but I guess the chain could take a little longer.

Probably the best way to troubleshoot this would be to get on a host on the same network segment and try a ping. If that fails, you could run “arp -a” and check to see if you indeed have the right mac address of the host you want to connect to. If not, you could probably log into the switch to delete the entry or you can create a task to run the command during switchover:

“C:\Program Files\VMware\VMware vCenter Server Heartbeat\R2\bin>nfpktfltr.exe arp”

You should then see the switchover happen without the long delay.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.