ESX IP STORAGE TROUBLESHOOTING BEST PRACTICE

VMware has released a new White paper about ESXi IP storage troubleshooting.

In this paper, we:
• Describe how you can analyze packet traces to identify functional and performance issues in an
ESX IP storage environment.
• Compare packet capture alternatives, and explain why we recommend an inline optical network
tap connected to a packet capture system.
• Present the challenges of 10G packet capture, and describe key features of commercial 10G
capture solutions.
• Describe the design of an inexpensive, self-assembled 10G packet capture solution optimized for
troubleshooting that you can build relatively easily. We also describe our experience with multiple
prototypes of this design, which we have used in our ESX IP storage testbeds for NFS and iSCSI
performance for many years.
• Present examples of analyzing packet traces to solve ESX performance issues for NFSv41, software
iSCSI over IPv6, and hardware iSCSI.

ESX-IP-storage-troubleshooting.pdf

ESXi show details about ramdisk usage

The command below will let you check the space that is free on the host for each of the ramdisk mount points. It also shows you the usage of inodes per ramdisk.

system visorfs ramdisk list	List the RAM disks used by the host.
~ # esxcli system visorfs ramdisk list
Ramdisk Name  System  Include in Coredumps   Reserved      Maximum       Used  Peak Used  Free  Reserved Free  Maximum Inodes  Allocated Inodes  Used Inodes  Mount Point
------------  ------  --------------------  ---------  -----------  ---------  ---------  ----  -------------  --------------  ----------------  -----------  ---------------------------
root            true                  true  32768 KiB    32768 KiB   3916 KiB   4076 KiB  88 %           88 %            8192              8192         5257  /
etc             true                  true  28672 KiB    28672 KiB    504 KiB    712 KiB  98 %           98 %            4096              1024          527  /etc
tmp            false                 false   2048 KiB   196608 KiB  29620 KiB  65016 KiB  84 %            0 %            8192              8192         3770  /tmp
hostdstats     false                 false      0 KiB  1078272 KiB  94324 KiB  99624 KiB  91 %            0 %            8192                32            5  /var/lib/vmware/hostd/stats

The comand below will give more details about a single ramdisk.

vsish -e get /system/visorfs/ramdisks/[RAMDISK-NAME]/stats

Example for root ramdisk

~ # vsish -e get /system/visorfs/ramdisks/root/stats
VisorFS ramdisk {
   Min:32 MB
   Max:32 MB
   Number of pages used:979
   Max number of pages used:1019
   Mem group ID:157
   Root inode:0
   Dump on coredump:1
   System:1
   Mount point inode:-6
   Root path:/
   First inode of ramdisk:0
   Max number of inodes:8192
   Number of allocated/initialized inodes:8192
   Number of used inodes:5263
   Max number of used inodes:8192
}

Example for tmp ramdisk

~ # vsish -e get /system/visorfs/ramdisks/tmp/stats
VisorFS ramdisk {
   Min:2 MB
   Max:192 MB
   Number of pages used:7405
   Max number of pages used:16254
   Mem group ID:1014
   Root inode:12288
   Dump on coredump:0
   System:0
   Mount point inode:8
   Root path:/tmp
   First inode of ramdisk:12288
   Max number of inodes:8192
   Number of allocated/initialized inodes:8192
   Number of used inodes:3770
   Max number of used inodes:8192
}

This will help you to troubleshoot out of disk space or out of inodes issues on a ESXi.

In one of my next posts I will go into details how to troubleshoot inode issues on a ESXi.

NetApp NFS APD issues – reduction of MaxQueueDepth

If you face APD’s in your environment you can follow the KB below to possible improve the situation.

http://kb.vmware.com/kb/2016122
https://kb.netapp.com/support/index?page=content&id=1014696

When using NFS datastores on some NetApp NFS filer models on an ESXi/ESX host, you experience these symptoms:
* The NFS datastores appear to be unavailable (grayed out) in vCenter Server, or when accessed through the vSphere Client
* The NFS shares reappear after few minutes
* Virtual machines located on the NFS datastore are in a hung/paused state when the NFS datastore is unavailable
* This issue is most often seen after a host upgrade to ESXi 5.x or the addition of an ESXi 5.x host to the environment but can also occur in vSphere 6 environment.

/var/log/vmkernel.log

NFSLock: 515: Stop accessing fd 0xc21eba0 4
NFS: 283: Lost connection to the server 192.168.100.1 mount point /vol/datastore01, mounted as bf7ce3db-42c081a2-0000-000000000000 (“datastore01”)
NFSLock: 477: Start accessing fd 0xc21eba0 again
NFS: 292: Restored connection to the server 192.168.100.1 mount point /vol/datastore01, mounted as bf7ce3db-42c081a2-0000-000000000000 (“datastore01”)
T

Additionally VMware released a new Patch for ESXi 5.5 / 6 which contains improvements of the NFS implementation which should make the ESXi more resilient to APDs.

You can find an great overview on the following sites. ESXi 5.5 Patches and ESXi 6 Patches

Besides running the latest version of ESXi it is highly recommended to apply the NetApp NFS vSphere recommendations.