«

»

Oct 09 2015

Acropolis Virtual Machine High Availability – Part I

“VMTurbo"

This is the first blog in a series of blog posts for the Nutanix Acropolis Virtual Machine High Availability (VMHA) feature. This feature only comes into play when you run Acropolis Hypervisor (AHV) on your Nutanix nodes (physical servers).

Before i continue i would like to provide some terminology explanation:

  • Acropolis Cluster with AHV – Refers to the physical and logical elements required to run virtual machines such as:
    • Physical servers called Nutanix nodes when not referring to a specific hypervisor running on the server.
    • The storage layer called Distributed Storage Fabric (DSF) formerly known as Nutanix Distributed File System (NDFS)
    • Hypervisor that can be Acropolis Hypervisor, ESXi or  Hyper
  • Nutanix AHV cluster – This term is used when referring to the logical grouping of AHV Hosts
  • Acropolis Hypervisor (AHV) host – Refers to an individual server/Nutanix Node running the Acropolis Hypervisor software. Nutanix node is often used when talking about the server with no specific hypervisor running.
  • Nutanix Node – Physical servers called Nutanix nodes when not referring to a specific hypervisor running on the server. In this case same as AHV Host.
  • Nutanix Block – Refers to the 2U chassi where the Nutanix Node is placed. A Nutanix block can hold 1, 2, 3 or a maximum of 4 Nutanix nodes depending on node model.
    Screen Shot 2015-10-12 at 11.43.16

VMHA ensures that failed VMs are started on another Acropolis Hypervisor (AHV) host in the Nutanix AHV cluster if a host and its corresponding VM fails. The feature also comes into play when there is a network partition and that will be discussed in a later blog post. VMHA is managed by the Acropolis master in the Nutanix AHV cluster and uses a master/slave architecture.

Acropolis4nodecluster1

Tnx to Josh Odgers for sharing this picture

VMHA can be enabled via PRISM UI or Acropolis CLI (aCLI) where acli offers the most advance configuration options and VMHA is enabled automatically for you as soon as you create the AHV host cluster and there are two different modes:

  • Default – This does not require any configuration and is the mode activated automatically when the Nutanix AHV cluster is created. When an AHV host becomes unavailable the failed VMs, previously running on the failed AHV host,  are restarted on the remaining hosts if possible based on Nutanix AHV cluster available resources.
  • Guarantee – This non-default configuration reserves capacity to guarantee that all failed VMs will be restarted on other AHV hosts in the Nutanix AHV cluster during a host failure. Use the Enable HA check box to turn on VMHA guarantee mode and start reserving capacity.
    VMHA1

When using the Guarantee configuration the following resource protection will be applied:

  • One AHV host failure if the DSF is configured for Nutanix Fault Tolerance 1 (Redundancy Factor 2).
  • Up to two AHV host failures if the DSF is configured for Nutanix Fault Tolerance 2 (Redundancy Factor 3).

If needed, you can change the number of reserved AHV hosts using the following aCLI syntax:

  • acli ha.update num_host_failures_to_tolerate=X

There are two different reservation types and the one that fits your Acropolis Cluster with AHV best will be automatically selected for you. The reservation types are:

  • kAcropolisHAReserveHosts – One or more AHV hosts are reserved for VMHA.
    • This reservation type will be automatically configured when the Nutanix AHV cluster consists of homogenous AHV hosts meaning hosts with same amount of RAM.
  • kAcropolisHAReserveSegments – Resources are reserved across multiple AHV hosts.
    • This reservation type will be automatically selected when the Nutanix AHV cluster consists of heterogeneous AHV host meaning host with different amount of RAM installed.

All VMs running in the Nutanix AHV Cluster will be protected by VMHA but there is an option to turn off VMHA on a per VM basis by setting a negative value (-1) using acli e.g.:

  • acli vm.update ha_priority=-1
  • acli vm.create ha_priority=-1

If the priority is changed it can be viewed by running the following command:

  • acli vm.get VM-name
  • Output will be presented at the end of the output and look similar to this:
    • ha_priority: -1
      Note: If configuration is not changed the VM will keep its default value of 0 and that is not presented in the “acli vm.get VM-name” output.

To respect data locality which is important for VM performance VMHA will make sure that VMs previously running on a failed AHV host will be migrated back to the host when it comes back online.

Take the following into account if/when you make any VMHA changes:

  • Use the non-default VMHA mode Guarantee when you need to ensure that all VMs can be restarted in case of an AHV host failure. Otherwise, use the default, VMHA mode.
  • When using the VMHA mode Guarantee always use the default VMHA reservation type of HAReserveSegments for heterogeneous Nutanix AHV clusters.
    Important: Never change the VMHA reservation type for heterogeneous Acropolis Clusters with AHV.
  • Use the default VMHA configuration type HAReserveHosts for homogeneous Nutanix AHV clusters.

There might be circumstances when you consider/want to change the default reservation type, HAReserveHosts, for homogeneous clusters. In such case, keep the following in mind:

  • Advantage of segments – If performance is critical, including maximizing local disk access and CPU performance.
  • Advantage of hosts – The VM consolidation ratio is higher because fewer host resources are reserved compared to the segments reservation type.

4 comments

Skip to comment form

  1. Sudhansu

    Does this only applicable to all Hypervisors or to Acropolis Hypervisor only? For example, if ESXi is in use, shouldn’t DRS take care of HA and bring up the VMs on right AHV?

    Want to understand whether Nutanix does not use the native hypervisor HA and overwrites them with VMHA.

  2. magander3

    Hi,
    this is only when you run Acropolis Hypervisor on your Nutanix Nodes. vSphere HA will work as usual when using ESXi. vSphere DRS and vSphere HA are two different things where DRS tries to delivers the expected resources to VMs while the system is up and running. vSphere HA provides VM availability and start VMs on ESXi hosts in the vSphere cluster when they failed because of an ESXi host failure, network partition or storage isolation.

    //Magnus

  3. danniel

    it says “Failed to update VM high availability on the cluster…” when i try to enable HA. Can you please tell me what might be the problem and what should i look out for?

  4. magander3

    Hi Daniel,
    can you post the output from the command acli ha.get here?
    Are you running any VMs in the cluster when trying to enable VMHA?

    //Magnus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">