VMware Update Manager : Cannot Run Upgrade Script on Host

I have a three node ESXi 5.0.0 (Build 504890) cluster running ESXi on SD cards. The scratch partition is located on a VMFS datastore accessed via iSCSI. I was in the middle of upgrading to 5.5 (U2) via Update Manager. Two of the nodes completed successfully but the third failed with a message "Cannot run upgrade script on host". 

Firstly I checked the VUM logs (C:\ProgramData\VMware\VMware Update Manager\Logs\

In vmare-vum-server-log4cpp :

'VciTaskBase.VciClusterJobDispatcherTask{523}' 3848 INFO]  [vciClusterJobSchedulerTask, 613] Remediation failed due to non mmode failure

This brought me to KB2007163 . However, when I looked in /var/log/vua.log there was no mention of any bootbank error nor was there any state.xxxxxxx/  folder in the /bootbank directory.

I then checked the vmkernel log files (/var/log/vmkernel.log). 

2015-08-11T14:48:54.368Z cpu0:69365)WARNING: VFAT: 293: VFAT volume mpx.vmhba32:C0:T0:L0:8 (UUID 3c3693e8-f77a642a-1910-5c6bdcb26d3a) is full.  (585696 sectors, 0 free sectors)
2015-08-11T14:48:54.399Z cpu0:69365)WARNING: VFAT: 293: VFAT volume mpx.vmhba32:C0:T0:L0:8 (UUID 3c3693e8-f77a642a-1910-5c6bdcb26d3a) is full.  (585696 sectors, 0 free sectors)

 

As you can see it appeared my vFat partition was full. On the ESXi host I checked the usage:

# df -h

Filesystem    Size   Used Available Use% Mounted on
VMFS-5      749.8G 492.3G    257.5G  66% /vmfs/volumes/LUN1
VMFS-5     1023.8G 700.1G    323.7G  68% /vmfs/volumes/LLUN2
VMFS-3        1.9T   1.7T    218.8G  89% /vmfs/volumes/LUN3
vfat        249.7M 162.1M     87.7M  65% /vmfs/volumes/Hypervisor1
vfat        249.7M 143.0M    106.7M  57% /vmfs/volumes/Hypervisor2
vfat        285.9M 285.9M     16.0K 100% /vmfs/volumes/Hypervisor3


# ls -l | grep store

lrwxrwxrwx    1 root     root                 49 Aug 11 15:14 store -> /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a

/vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a is a symbolic link for  /vmfs/volumes/Hypervisor3


# cd /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a/var/core
# ls -l

-rwx——    1 root     root        33.9M Sep 10  2012 hostd-worker-zdump.000
-rwx——    1 root     root        34.6M Oct 26  2014 hostd-worker-zdump.001
-rwx——    1 root     root        31.3M Nov  3  2014 hostd-worker-zdump.002

It appears this ESXi host has crashed a few times which had created thes dump files. As these were months/years ago we could just remove them

 

 

# rm hostd-worker-zdump.00*

This freed up enough space for the upgrade to take place.

 

It appears the upgrade process was unable to create a journal as there was not enough space left on the ESXi partition. Journals are also created for such operations as vMotion so it is important that these partitions never reach 100%.