«

»

Sep 01 2017

VMworld 2017 Breakout Session – Troubleshooting vSphere 6 Made Easy

“VMTurbo"
Session “Troubleshooting vSphere 6 Made Easy: Expert Talk – INF9205R” was presented by Ragavendra Kumar who works as a Technical Support Supervisor at  VMware and Abhilash Kunhappan who is a Staff Technical Support Engineer at VMware. The following areas were covered:

The session included both methodology & actually tips and tricks and the session started with describing an overall methodology you can/should use when having a problem. This is not a detailed description and does not include things like, open support case, collect log files and so on.
  1. Identifying the symptoms – What is the problem all about e.g task is failing
  2. Defining the problem – Who cause the actual problem, software or hardware related as an example
  3. Testing the solution – When I know what is wrong, how should I test it prod, dev, stage environment and are other software solutions included e.g. SRM that could cause a problem for a particular patch/solution.

What are the troubleshooting options available for customers. And yes, the error messages mentioned are actually much better compared to a few years ago:)
The log files will tell you what the problem is, the trick is to understand how to read and correlate them and helpful tools with this are vRLI and or vMA.

Heads UP: If you got a VMFS corruption you should always reach out to support and do not follow anything you might find in KB articles.

Before digging into the different sections I just wanted to make you aware, unless you are already familiar with it, of the command localcli. When you use the command esxcli for configuration or troubleshooting purposes and it hangs it means that the ESXi process hostd has a problem. localcli is your friend here since it bypasses hostd..

vCenter Server

The most common issues are:

  • vCenter Server Upgrade
  • Challenges with SSL Certificates – Totally agree 🙂
  • Linked Mode Configuration Issues – One or more venter Server are missing from the inventory
  • Both internal & external DB issues
  • Crashes

An important first step is to identify if the vCenter Server and Platform Service Controller (PSC) runs on same VM or not. For the Windows based vCenter Server you can run the following command to find that out:

reg query “HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc\vCenter Server” /v INSTALL_TYPE

The output will be either:

  • Embedded – meaning both vCenter Server and PSC are installed on same server
  • Infrastructure – meaning vCenter Server and PSC are not installed on same server

Another command you can use it the below one which will give you the vCenter Server build number:

reg query “HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc\vCenter Server” /v BuildNumber

Some useful tools available are listed below.

  • certificate-manager – Manage PSC and vCenter Server certificates
  • certool – Manage certificates
  • cmsso-util – repoint, reconfigure vCenter Server, unregister and move services across PSCs.
  • dir-cli – to manage solution users, certificates, passwords
  • service-contro –list (2xminus sign before list) – List the vCenter Server services (and there are a lot)
  • vdcadmintool – Test LDAP connectivity and reset passwords
  • vdcrepafmin – show, add, remove replication between PSCs
  • ssolscli listServices – List registered services with vCenter Server 5.x
  • stool.py – List registered services with vCenter Server 6.x

Log File Locations

ESXi Server

Relationship between vpxd (vCenter Server Process) – vpxa (ESXi agent) – hostd (ESXi process) was described.

The most common issues are:

  • ESXi crash
  • ESXi host disconnected or not responding from vCenter Server
  • ESXi upgrade to version 6
  • HA agent not getting configured
  • VM issues including e.g. Power operations, Snapshot Consolidation

Available ESXi tools

  • esxcli – Config & monitoring. Can also use localcli
  • esxtop – Performance monitor & troubleshooting
  • vm-cmd – ESXi & VM config. Useful to verify if hostd is fine.
  • vm-support – Generate log bundle
  • vmdumper – VM core dump management
  • vmkfstools – VMFS and VM disk management. Hidden option -t 10 (meaning you increase the verbosity by 10 times)
  • vmkping – Check network connectivity for vmkernel interfaces

Log files:

One log file I usually tend to forget about is the vobd.log log so don’t forget that one 🙂

Networking

As always, if anything fails blame the network 🙂

Let’s again start with the most common issues are:

  • Network performance problems
  • No connection to ESXi host or package drop is experienced
  • Ports closed in firewall between ESXi and vCenter Server causing communication issues or ESXi host disconnect.
  • VM doen not have network connectivity
  • VM loses network connectivity after power cycle or vMotion operation – Solved by disable and enable the vNIC
  • vMotion failing – If it fails between 1% -10% (11%) it is cause by network problem.

Troubleshooting commands:

  • esxcli network – Configure & monitor
  • esxcfg-nics or esxcfg-vmknic
  • esxtop option n – Performance monitor & troubleshooting
  • ethtool – Identify device driver setting and see network statistics
  • pktcap-uw – Capture package for both uplink interface and vmkernel interface
    • pktcap-uw -h | more
    • pktcap-uw —vmk vmk0 -o pktcap.pcap
  • vsish
    • vsysh get /net/portsets/vSwitch<x>/ports/port-ID/vmxnet
    • vsish get /net/pNics/vmnic<x>/stats
  • tcpdump – Capture package from uplink interface
    • tcpdump-uw -i vmk0-s 1514 -w traffic.pcap

Storage

Starting with explaining the were well known DAVG/cmd, KAVG/cmd and GAVG/cmd which you can inspect via esxtop.

The most common issues are:

  • All Path Down
  • Missing LUN
  • VMFS Datastore inaccessible or not visible
  • SCSI Reservation Conflicts – Should not be there anymore based on e.g. ATS
  • Storage or datastore perf issues

Troubleshooting commands includes:

  • esxcli storage – Configure & monitor
  • esxtop option d – Performance monitor & troubleshooting
    • enable read and write latency via the f option
  • vmkchdev – Map HBA devices to PCI slots
  • vmkload_mod – View, load & unload HBA drivers
  • partedUtil – List, create, recreate partition table on disks
    • partedUtil getptbl /vmfs/devices/disks/naa.XYZ
  • voma – Check VMFS metadata (vSphere On-Disk Metadata Analyzer)
    • voma -m wmfs -d /vmfs/devices/disks/naa.XYZ -s /tmp/analyze.txt
  • vmkfstools – VMFS and VM disk management

 

 

2 comments

  1. Eric Sloof

    The last two images were taken from a presentation I’ve created for the Dutch VMUG back in 2010. The complete slide-deck can be found here: VMware vSphere Advanced Troubleshooting – The Slide Deck – https://www.ntpro.nl/blog/archives/1656-VMware-vSphere-Advanced-Troubleshooting-The-Slide-Deck.html

  2. magander3

    Ok that’s awesome. thanks for the heads up. You should reach out to the presenters and have them give you credit for your work.

    //Magnus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">