Random Loss of Connectivity on ESXi5 NICs.

August 20, 2013 Jordansphere VMware

We lost network connectivity to all of the VMs on a particular ESXi 5.0 host. vMotion, Mgmt Network and Storage network were all unaffected. No other host was affected and the NICs were twinned for resilience. When clicking on the NICs via the dVS tab  none of the CDP information was showing for either NIC (Cisco Discovery Protocol is not available on this physical adapter). Both ends were showing as up with the correct duplex/speed. Looking on the physical (Cisco) switch the Mac-Addresses of the VMs on the ESXi host were in the ARP table and even appeared on the secondary NIC after pulling the cable on the primary link. The VMs could ping each other internally but couldnt ping antying externally or on the same cluster.  How strange!

After troubleshooting the switch, I moved onto the ESXi host

It appears that there is known issue with Broadcom 5719/5720 NICs becoming unresponsive. Official VMware KB is here 2035701 .

There is a driver update for the ESXi hosts which can be patched via PowerCLI/Update Manager.

However, there is quick workaround if you are running 1Gb NICs throughout. SSH onto the host. The problem seems to be down to a special feature called "NetQueue". This performance enhancement does not benefit 1Gb NICs so you can turn this feature off.

~ # esxcli system settings kernel set -s netNetqueueEnabled -v FALSE
~ # reboot

You can also do this via the GUI. Check out the KB.

Note: I made sure all the VMs were vmotioned off and the host was in maintenance mode before doing this.

 

Powered by WordPress. Designed by elogi.