«

»

Dec 23 2014

Nutanix Metro Availability configuration and failover

“VMTurbo"
A few days ago my colleague David Broome tested one of the new features, Metro Availability, in Nutanix Operating System (NOS) 4.1 that will be released next year. Yeah i know that time frame is a little bit vague but stay tuned beginning of next year year and you might have some more information:)

The NOS Metro Availability requires 2 Nutanix clusters , one at each site. The data replication is synchronous so we are not talking about disaster recovery (DR) which has been a feature in NOS for a long time.

This bog post presupposes that the Nutanix clusters are already in place since it will only cover the Metro Availability configuration and failover steps.

I’ll use the concept of primary and remote sites in this blog post even though both sites are up and running during normal operation just like in a traditional stretched cluster. The reason for primary and remote sites is that Nutanix Metro Availability is configured per Nutanix container and you need to define the roles per container. This means Site A can be primary and Site B remote for Nutanix Container 1 and Site A can be remote and Site B primary for Nutanix Container 2.

Configuration

Follow the below steps in Nutanix PRISM (UI) to setup Metro Availability

  • Create one Nutanix container at the primary site and one Nutanix Container at the remote site.  Important to use the same Nutanix container on both sites.
  • Create the Remote Site mappings from the primary site via Data Protection -> Remote Site
    d5e349611f9dde46089c9fb9075d813a
  • Select the physical cluster, provide the Name and the IP for the remote site. Yes the first step is to create a DR configuration but hold on….
    a2e8ac4a0bc938ea6a3028fbd7e1d9ef
    Advanced settings requires you to specify the Source and Destination containers and you can also specify maximum bandwidth and select compression on wire or not.
    eb86fa1976c952377944ddb5a91acc7a
  • Create Metro Availability Protection Domain via Data Protection -> Add Protection domain -> Metro Availability1940e4f7234f70b4d24b749da0864fec
  • Type a name
  • Select the container you created the remote site mapping with
    bc9e42356e3423d81facbb56a8ef41b5
  • Select the Remote Sites mapping you created
    6fb34809ae53c6c4d63fdb0a7d41880b-2
  • Set VM Availability. This means what will happen in case the network connectivity disappear between the primary and remote site and you got two options:
    • Stop the Metro Availability functionality automatically. Between 10-30 seconds are available options.
    • Stop the Metro Availability functionality manually736fba98ad7e8b6a46efdf881cf6f989
  • Last step is to review your settings before creating the Metro Availability protection domain. A warning message about overwriting all data on the destination container is shown.

Failover

Four Metro Availability Protection Domains have  been created for the purpose of this blog post.

Follow the below steps to failover Metro Availability Protection Domain CLS-Metro2 (protecting Nutanix container and vSphere NFS Datastore CLS2-Metro2)  from Nutanix cluster gso-cluster2 to gso-cluster3.

  • Select the Metro Availability Protection Domain you want to failover and select Disable.d4acfc60c5db6d627f46f7c42820013d
  • Answer yes to the popup warning regarding disabling the Metro Availability on CLS-Metro2.
    3c82be2431837452a892fcab64bd805b
  • Now when the Nutanix container is no longer active for the vSphere cluster we need to activate (enable read/write (R/W)) for the Nutanix container.
  • Go to the Nutanix cluster where you want to activate R/W -> select the disabled Metro Availability Protection Domain -> click Promote
    e123c63cc90a632a5986e84967deb363
  • Answer yes to the Promote Metro Availability popup question.
    d358d458ef01f7f326795ff5ab9e2b76
  • Metro Availability Protection Domain is active -> click Re-enable
    d52d6215f2a284a1d2b114002d8e35fc
  • Answer yes to popup about Re-enable Metro Availability for the Protection Domain which will overwrite data on the remote site.
    7fc46af7b801a5672d5e5a998cb43b51
  • The VMs running on the NFS Datastore we migrated from one site to another will experience a VMware HA event.
    11da9118278a062a282bdfa8e11c2150

  Summary

Nutanix Metro Availability provides an easy way to increase the availability of your workload and the same time protect the workload between sites.

7 comments

Skip to comment form

  1. vdoogle

    Reblogged this on Jon Kohler's Blog.

  2. Manfred

    a lot of questions are raising when I’m thinking of the functionality:
    how are the NFS mounts handled during failover? do I have to configure some affinity rules in the cluster? is the Metro Availbility container also mounted on the secondary site during normal operation?

  3. magander3

    Hi, Good questions.
    A manual interaction or scripted solution is required to activate the Nutanix Container (vSphere NFS Datastore) on the remote site when the primary site fails. You might end up in scenarios where a VM lives on one site and access it’s data over the wire meaning affinity rules might be necessary. Configuration recommendations will be released when the feature is GA early next year.

    Thanks

  4. Manfred

    Thank you for the fast response! next questions are coming 🙂
    so datastore hearbeating must be disabled on the cluster, right?
    promote will mount the nfs datastore on the nodes when I select promote?
    When I perform a negoiated failover (during normal operation) the nfs datastore will be dismounted on the primary site, right? does nutanix rely therefore on the HA events triggered by the ESX Cluster or are they actively triggered by the cvms?

  5. magander3

    I don’t see any problem of using datastore heartbeat in the Metro Availability scenario. The promote option will change the ownership of the container and make it possible for the site to manage read/write operations for that container. Metro Availability relies on vSphere HA to VM manage failover so the VM will be restarted on the new active site.

    We will release a lot more details when the feature is GA so stay tuned.

    //Magnus

  6. syed

    Hi,

    recently in our company they got nutanix and VM.

    I run ping db-server-name from my application server.

    When I checked the Event viewer of my application Server. there is Communicaiton Link error. ie. while connecting to my database server.

    When i check the time of the error and the ping status. Before communication comes, there was Request time out in my ping status.

    The company which implemented said, they check Vm and Nutanix and found no errors…But recently they say in cisco switch it has old OS. This might be the reason.

    In the same switch I have fixed one physical server. Which is not having single request timed out.

    What u feel might be the case.

    regards

  7. magander3

    Hi,
    have you tried to connect the VM to the same network (and also change IP if needed) as where you have the hypervisor management interface and the CVMs?
    If that is not working you can focus on troubleshooting the VM.

    Then open a support case with Nutanix, they are really awesome.

    thanks

    //Magnus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">