Using vim-cmd to remedy a bsod

Here’s a great tutorial for vim-cmd if you haven’t had experience with it before by my friend, Steve Jinwww.doublecloud.org/2013/11/vmware-esxi-vim-cmd-command-a-quick-tutorial/

This is a real-world situation I got myself into when I tried connecting to my client VM and found a BSOD that looked like this:

It’s pretty obvious that the reason for the crash is the USB stick that’s plugged in from the usbuhci.sys line in the blue screen. Since I tunnel into my client VM via SSH and VNC, the easiest way for me to shutdown my VM and remedy this issue is through vim-cmd. This only works if you have SSH allowed onto your ESXi host or if you are connecting to the host with the VMware CLI or vMA or whatever they’re calling it these days. I have the former.

The first thing I do after logging into the ESXi host as root is run:

vim-cmd vmsvc/getallvms

I need to know which one of my VMs is the one to manage. I get this:

Vmid Name File Guest OS Version Annotation
1 windows7 [BIG_DISK] windows7/windows7.vmx windows7_64Guest vmx-07
3 thimble [BIG_DISK] thimble/thimble.vmx ubuntu64Guest vmx-08
4 chunli [Datastore 2] chunli2/chunli2.vmx ubuntu64Guest vmx-08
5 zangief [Datastore 2] zangief2/zangief2.vmx ubuntu64Guest vmx-11

With this information, I know that it’s VM 1, so I power it off by running:

vim-cmd vmsvc/power.off 1

Thinking the USB issue might be a fluke, I try to power the VM back on to see if it will boot.

vim-cmd vmsvc/power.on 1

I see that it starts booting, but as the resolution changes on the VM, my VNC viewer freezes. Since I normally don’t know exactly when it freezes, I didn’t know when I got the BSOD again.

Until I decided to at look at the vmware.log file. This is what I saw there:

2017-10-18T22:27:05.519Z| svga| I125: SVGA disabling SVGA
2017-10-18T22:27:05.545Z| svga| W115: WinBSOD: (20) 'Technical information: '
2017-10-18T22:27:05.545Z| svga| W115:
2017-10-18T22:27:05.546Z| svga| W115: WinBSOD: (22) '*** STOP: 0x000000D1 (0xFFFFF88000BF2000,0x0000000000000002,0x0000000000000001,0'
2017-10-18T22:27:05.546Z| svga| W115:
2017-10-18T22:27:05.546Z| svga| W115: WinBSOD: (23) 'xFFFFF88004206E49) '
2017-10-18T22:27:05.546Z| svga| W115:
2017-10-18T22:27:05.557Z| svga| W115: WinBSOD: (26) '*** usbuhci.sys - Address FFFFF88004206E49 base at FFFFF88004200000, DateStamp'
2017-10-18T22:27:05.557Z| svga| W115:
2017-10-18T22:27:05.557Z| svga| W115: WinBSOD: (27) ' 57b37a29 '
2017-10-18T22:27:05.557Z| svga| W115:
2017-10-18T22:27:05.557Z| svga| W115: WinBSOD: (30) 'Collecting data for crash dump ... '
2017-10-18T22:27:05.557Z| svga| W115:
2017-10-18T22:27:05.573Z| svga| W115: WinBSOD: (31) 'Initializing disk for crash dump ... '
2017-10-18T22:27:05.573Z| svga| W115:
2017-10-18T22:27:07.547Z| mks| W115: Guest operating system crash detected.

Okay, so I see that my hunch is correct. I guess it’s time I remove the USB device from the VM. So I power off the VM again and open up the vmx file and just start removing all instances of USB.

These are the lines I removed. Don’t worry about breaking anything. The hypervisor will put them back if you need them later. Back up your vmx file before doing it though just in case.

usb.pciSlotNumber = "34"
usb.present = "TRUE"
usb:1.speed = "2"
usb:1.present = "TRUE"
usb:1.deviceType = "hub"
usb:1.port = "1"
usb:1.parent = "-1"
usb.autoConnect.device0 = "path:1/1 autoclean:1"
usb:0.present = "TRUE"
usb:0.deviceType = "mouse"
usb:0.port = "0"
usb:0.parent = "-1"

After you’ve saved your changes, you’ll need to reload the changes so that ESXi will reread the .vmx file to remove the USB device. You can do this by running this command:

vim-cmd vmsvc/reload 1

Now you’re ready to power on the VM.

vim-cmd vmsvc/power.on 1

The VM powers up and I’m back in business. I just had to figure out the USB issue later. Turned out that I just needed to reconnect the device and reformat it. I haven’t seen the issue come up again.

 

if vCenter Server Heartbeat or Neverfail Heartbeat failover appears to be not working or taking a long time …

The main symptom of this problem would be that the vCenter Server Heartbeat console or Neverfail Management Client console would show that the services had failed over, but if you were to try to ping it, it wouldn’t respond.
Logically, there there are some hypotheses you could come up with:

1) Network packet filter isn’t revealed on the active server, so we can’t connect to it.
2) Something wrong w/ the service.
3) The console is wrong and on the backend, nothing failed over.

These would be all wrong.

What we found was that it was an issue with ARP caching on the switches. Because the VM or host abruptly fell off the network, the switches hadn’t expired the ARP entries and that they were stale. You would think that it would be fixed in a minute after the ARP entries expired, but I guess the chain could take a little longer.

Probably the best way to troubleshoot this would be to get on a host on the same network segment and try a ping. If that fails, you could run “arp -a” and check to see if you indeed have the right mac address of the host you want to connect to. If not, you could probably log into the switch to delete the entry or you can create a task to run the command during switchover:

“C:\Program Files\VMware\VMware vCenter Server Heartbeat\R2\bin>nfpktfltr.exe arp”

You should then see the switchover happen without the long delay.

OPS1 – VMware Management app for the iPhone – Fantastic!

I’ve been using this app for quite some time, but haven’t found the time to write about it.

If you use an iPhone or iPad and manage a vSphere environment, you’ll want this app. You can get it here: OPS1 – VMware and Amazon AWS Cloud Management for …

It’s made by a company called Spragos based out of Santa Clara, CA. You can find their website here: http://www.spragos.com.

It’s pretty awesome that I could manage my vSphere hosts and VMs without having to power on the laptop. Since I’m on a Mac, I don’t enjoy bringing up the vSphere thick client and even the web client takes quite some time to load. Most of the time, I just need to power on or off a VM or shutdown a host anyways. This app has allowed for me to do these things without having to power on my laptop or even if I’m on the laptop, I don’t need to start up Fusion for the client and I’m loving it.

Here are some screen shots. You can configure a single or multiple hosts – connect to vCenter or an ESX host directly. It will also cache credentials. Since I’m not necessarily in a super secure environment (my home lab), I don’t care much about security. I hate having to type my password in over and over just to log in or even my user name for that matter.

After logging in, here’s my home screen. From here, I usually head over to Virtual Machines or Hosts, depending on what I want to do.

IMG_2026

I’m I’m interested in what’s going on overall, I would navigate to Status. Here, I could see at a high level that everyone’s going just fine with my host.

IMG_2032

It’s not always this way though – see, it pulls events and alarms from Corporate Event Planners.

IMG_2400-1

If you go into VMs, you can see a nice list of the VMs:

IMG_2027

Then, you can drill into the properties of the VM and see what’s going on, make changes, power on or off, etc

IMG_2029

IMG_2028

If a VM was suddenly unresponsive for some reason, maybe the CPU stats could give you a clue as to what was going on. In my case, I just had a couple of spikes.

IMG_2030

I think you get my point. It’s a great app! Download it free and try it yourself. I honestly feel that the value of the free version is well worth the measly $10 for upgrading to the Enterprise version. It’s probably saved me hours of time if you aggregate the couple of minutes it takes to start up the mac, start up fusion, start up the vSphere client and then logging into the ESX or vCenter server.

Here’s a few other screenshots just for eye candy’s sake.

HOW TO move ESX hosts between clusters without maintenance mode

I’ve done this a number of times and never had a problem, but this may not be a good idea.

The steps are to:

1. Right click and disconnect the ESX host from vCenter.
2. Right click and remove the host you just disconnected from vCenter (may hang VC temporarily in the minute rangeor operation may time out) re-log back into VC, once you are able to it host should be gone from the cluster.
3. Re-add to VC under new cluster