I have a three node ESXi 5.0.0 (Build 504890) cluster running ESXi on SD cards. The scratch partition is located on a VMFS datastore accessed via iSCSI. I was in the middle of upgrading to 5.5 (U2) via Update Manager. Two of the nodes completed successfully but the third failed with a message "Cannot run upgrade script on host".
Firstly I checked the VUM logs (C:\ProgramData\VMware\VMware Update Manager\Logs\
In vmare-vum-server-log4cpp :
'VciTaskBase.VciClusterJobDispatcherTask{523}' 3848 INFO] [vciClusterJobSchedulerTask, 613] Remediation failed due to non mmode failure
This brought me to KB2007163 . However, when I looked in /var/log/vua.log there was no mention of any bootbank error nor was there any state.xxxxxxx/ folder in the /bootbank directory.
I then checked the vmkernel log files (/var/log/vmkernel.log).
2015-08-11T14:48:54.368Z cpu0:69365)WARNING: VFAT: 293: VFAT volume mpx.vmhba32:C0:T0:L0:8 (UUID 3c3693e8-f77a642a-1910-5c6bdcb26d3a) is full. (585696 sectors, 0 free sectors)
2015-08-11T14:48:54.399Z cpu0:69365)WARNING: VFAT: 293: VFAT volume mpx.vmhba32:C0:T0:L0:8 (UUID 3c3693e8-f77a642a-1910-5c6bdcb26d3a) is full. (585696 sectors, 0 free sectors)
As you can see it appeared my vFat partition was full. On the ESXi host I checked the usage:
# df -h
Filesystem Size Used Available Use% Mounted on
VMFS-5 749.8G 492.3G 257.5G 66% /vmfs/volumes/LUN1
VMFS-5 1023.8G 700.1G 323.7G 68% /vmfs/volumes/LLUN2
VMFS-3 1.9T 1.7T 218.8G 89% /vmfs/volumes/LUN3
vfat 249.7M 162.1M 87.7M 65% /vmfs/volumes/Hypervisor1
vfat 249.7M 143.0M 106.7M 57% /vmfs/volumes/Hypervisor2
vfat 285.9M 285.9M 16.0K 100% /vmfs/volumes/Hypervisor3
# ls -l | grep store
lrwxrwxrwx 1 root root 49 Aug 11 15:14 store -> /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a
/vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a is a symbolic link for /vmfs/volumes/Hypervisor3
# cd /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a/var/core
# ls -l
-rwx—— 1 root root 33.9M Sep 10 2012 hostd-worker-zdump.000
-rwx—— 1 root root 34.6M Oct 26 2014 hostd-worker-zdump.001
-rwx—— 1 root root 31.3M Nov 3 2014 hostd-worker-zdump.002
# rm hostd-worker-zdump.00*
This freed up enough space for the upgrade to take place.
It appears the upgrade process was unable to create a journal as there was not enough space left on the ESXi partition. Journals are also created for such operations as vMotion so it is important that these partitions never reach 100%.